{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Capital Bikeshare 共享单车数据预测——线性回归分析"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1、导入必要的工具包"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np # 矩阵运算\n",
    "import pandas as pd # 数据处理\n",
    "\n",
    "from sklearn.metrics import r2_score  #评价回归预测模型的性能\n",
    "from sklearn.utils import shuffle  \n",
    "\n",
    "import matplotlib.pyplot as plt # 可视化\n",
    "import seaborn as sns # 可视化\n",
    "\n",
    "#图形出现在代码下方，不是新建窗口\n",
    "%matplotlib inline "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2、数据探索"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在另一个文件中已经做了数据探索，此处直接进行下一步"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3、数据准备"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(731, 16)"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#使用pandas读取.csv格式数据\n",
    "data_raw = pd.read_csv(\"Bike-Sharing-Dataset/day.csv\")\n",
    "data_raw.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4、数据预处理/特征工程"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.1移除部分无用特征并分割数据集"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "根据对数据探索和分析，先将原始数据中不需要的特征进行移除,此处总共去除了5个属性，年份属性也需要移除，但下方还需要根据年份属性分割测试集和训练集，故先保留"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(731, 11)"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#移除目录号\n",
    "data_drop_instant = data_raw.drop('instant', axis = 1)\n",
    "#移除日期\n",
    "data_drop_dteday = data_drop_instant.drop('dteday', axis = 1)\n",
    "#移除注册用户数\n",
    "data_drop_casual = data_drop_dteday.drop('casual', axis = 1)\n",
    "#移除非注册用户数\n",
    "data_drop_registered = data_drop_casual.drop('registered', axis = 1)\n",
    "#由于气温和体感温度线性相关系数为1，强相关，故剔除其中一个\n",
    "data_drop_temp = data_drop_registered.drop('temp', axis = 1)\n",
    "\n",
    "data = data_drop_temp\n",
    "data.shape\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将移除过无用特征得到的数据进行训练集和测试集的分离，此处根据年份将2011年的数据指定为训练集，将2012年的数据指定为测试集，然后再分别移除年份属性"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(365, 11)"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_train0 = data[data.yr==0]\n",
    "data_test0 = data[data.yr==1]\n",
    "\n",
    "data_train0.shape\n",
    "#data_test0.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(365, 10)"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#移除年份\n",
    "data_train = data_train0.drop('yr', axis = 1)\n",
    "data_test = data_test0.drop('yr', axis = 1)\n",
    "\n",
    "data_train.shape\n",
    "#data_test.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2数据预处理 / 特征工程"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.1去除离群点"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "由数据分析可以得到，训练集中的部分特征存在离群点，若不去除，可能会影响最后的预测值与真值残差的分布，即预测的准确性。而测试集中的数据只用于最终测试，不应该去除样本数据。阈值大小可通过散点图和直方图估计出来"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(357, 10)"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#去除属性 cnt 中大于5700的样本数据\n",
    "data_train = data_train[data_train.cnt < 5700]\n",
    "#去除属性 hum 中小于0.25的样本数据\n",
    "data_train = data_train[data_train.hum > 0.25]\n",
    "#去除属性 windspeed 中大于0.4的样本数据\n",
    "data_train = data_train[data_train.windspeed < 0.4]\n",
    "\n",
    "data_train.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.2打乱训练集"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了增加数据的随机性，需要把训练集数据打乱，测试集理论上不应该进行数据处理，而且不参与训练，故不进行打乱"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>season</th>\n",
       "      <th>mnth</th>\n",
       "      <th>holiday</th>\n",
       "      <th>weekday</th>\n",
       "      <th>workingday</th>\n",
       "      <th>weathersit</th>\n",
       "      <th>atemp</th>\n",
       "      <th>hum</th>\n",
       "      <th>windspeed</th>\n",
       "      <th>cnt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0.363625</td>\n",
       "      <td>0.805833</td>\n",
       "      <td>0.160446</td>\n",
       "      <td>985</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0.353739</td>\n",
       "      <td>0.696087</td>\n",
       "      <td>0.248539</td>\n",
       "      <td>801</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0.189405</td>\n",
       "      <td>0.437273</td>\n",
       "      <td>0.248309</td>\n",
       "      <td>1349</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0.212122</td>\n",
       "      <td>0.590435</td>\n",
       "      <td>0.160296</td>\n",
       "      <td>1562</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0.229270</td>\n",
       "      <td>0.436957</td>\n",
       "      <td>0.186900</td>\n",
       "      <td>1600</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   season  mnth  holiday  weekday  workingday  weathersit     atemp       hum  \\\n",
       "0       1     1        0        6           0           2  0.363625  0.805833   \n",
       "1       1     1        0        0           0           2  0.353739  0.696087   \n",
       "2       1     1        0        1           1           1  0.189405  0.437273   \n",
       "3       1     1        0        2           1           1  0.212122  0.590435   \n",
       "4       1     1        0        3           1           1  0.229270  0.436957   \n",
       "\n",
       "   windspeed   cnt  \n",
       "0   0.160446   985  \n",
       "1   0.248539   801  \n",
       "2   0.248309  1349  \n",
       "3   0.160296  1562  \n",
       "4   0.186900  1600  "
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(357, 10)"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_train=shuffle(data_train)\n",
    "data_train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>season</th>\n",
       "      <th>mnth</th>\n",
       "      <th>holiday</th>\n",
       "      <th>weekday</th>\n",
       "      <th>workingday</th>\n",
       "      <th>weathersit</th>\n",
       "      <th>atemp</th>\n",
       "      <th>hum</th>\n",
       "      <th>windspeed</th>\n",
       "      <th>cnt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>225</th>\n",
       "      <td>3</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0.624388</td>\n",
       "      <td>0.817500</td>\n",
       "      <td>0.222633</td>\n",
       "      <td>3820</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>154</th>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.594696</td>\n",
       "      <td>0.456250</td>\n",
       "      <td>0.123142</td>\n",
       "      <td>5342</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>147</th>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.612379</td>\n",
       "      <td>0.729583</td>\n",
       "      <td>0.230092</td>\n",
       "      <td>4758</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>148</th>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.615550</td>\n",
       "      <td>0.818750</td>\n",
       "      <td>0.213938</td>\n",
       "      <td>4788</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>163</th>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0.601654</td>\n",
       "      <td>0.494583</td>\n",
       "      <td>0.305350</td>\n",
       "      <td>5020</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     season  mnth  holiday  weekday  workingday  weathersit     atemp  \\\n",
       "225       3     8        0        0           0           2  0.624388   \n",
       "154       2     6        0        6           0           1  0.594696   \n",
       "147       2     5        0        6           0           1  0.612379   \n",
       "148       2     5        0        0           0           1  0.615550   \n",
       "163       2     6        0        1           1           1  0.601654   \n",
       "\n",
       "          hum  windspeed   cnt  \n",
       "225  0.817500   0.222633  3820  \n",
       "154  0.456250   0.123142  5342  \n",
       "147  0.729583   0.230092  4758  \n",
       "148  0.818750   0.213938  4788  \n",
       "163  0.494583   0.305350  5020  "
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#检验一下数据是否已经被打乱\n",
    "data_train.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.3拆分输入特征和输出真值"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将训练集和测试集分别拆分为输入特征 X 和输出真值 y\n",
    "\n",
    "预期结果：\n",
    "X_train （357，9）；\n",
    "y_train （357，1）；\n",
    "X_test （366，9）；\n",
    "y_test （366，1）；"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(357, 9)"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_train = data_train['cnt'].values\n",
    "X_train = data_train.drop('cnt', axis = 1)\n",
    "\n",
    "#用于后续显示权重系数对应的特征\n",
    "columns = X_train.columns\n",
    "\n",
    "y_test = data_test['cnt'].values\n",
    "X_test = data_test.drop('cnt', axis = 1)\n",
    "\n",
    "X_train.shape\n",
    "#X_test.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.4标准化处理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从数据探索可以看出，数据的每个特征取值区间差异比较大，故需要进行数据标准化预处理，即减去均值再除以方差\n",
    "\n",
    "对原始特征进行标准化，可以降低每个特征的取值范围差异，保证每个维度的特征数据方差为1，均值为0,从而使训练得到的权重各系数差异降低，使得预测结果不会被某些维度过大的特征值而主导,而且后续还可通过系数来衡量各特征的重要性"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\utils\\validation.py:475: DataConversionWarning: Data with input dtype int64 was converted to float64 by StandardScaler.\n",
      "  warnings.warn(msg, DataConversionWarning)\n"
     ]
    }
   ],
   "source": [
    "# 数据标准化\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "# 分别初始化对特征和目标值的标准化器\n",
    "ss_X = StandardScaler()\n",
    "ss_y = StandardScaler()\n",
    "\n",
    "# 分别对训练和测试数据的特征以及目标值进行标准化处理\n",
    "X_train = ss_X.fit_transform(X_train)\n",
    "X_test = ss_X.transform(X_test)\n",
    "\n",
    "#对y标准化的好处是不同问题的w差异不太大，同时正则参数的范围也有限\n",
    "y_train = ss_y.fit_transform(y_train.reshape(-1, 1))\n",
    "y_test = ss_y.transform(y_test.reshape(-1, 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5、确定模型类型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.1尝试缺省参数的线性回归"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>abs_coef</th>\n",
       "      <th>columns</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>[0.6444774183345154]</td>\n",
       "      <td>atemp</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[0.2932870174143726]</td>\n",
       "      <td>season</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>[0.19883735613055464]</td>\n",
       "      <td>weathersit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>[0.09730252159431736]</td>\n",
       "      <td>windspeed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>[0.09249631254573863]</td>\n",
       "      <td>hum</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[0.05380930040104529]</td>\n",
       "      <td>holiday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[0.03036364260009024]</td>\n",
       "      <td>weekday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>[0.013813330209271746]</td>\n",
       "      <td>workingday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[0.009025595984837709]</td>\n",
       "      <td>mnth</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 abs_coef     columns\n",
       "6    [0.6444774183345154]       atemp\n",
       "0    [0.2932870174143726]      season\n",
       "5   [0.19883735613055464]  weathersit\n",
       "8   [0.09730252159431736]   windspeed\n",
       "7   [0.09249631254573863]         hum\n",
       "2   [0.05380930040104529]     holiday\n",
       "3   [0.03036364260009024]     weekday\n",
       "4  [0.013813330209271746]  workingday\n",
       "1  [0.009025595984837709]        mnth"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 线性回归\n",
    "#class sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)\n",
    "from sklearn.linear_model import LinearRegression\n",
    "\n",
    "# 使用默认配置初始化\n",
    "lr = LinearRegression()\n",
    "\n",
    "# 训练模型参数\n",
    "lr.fit(X_train, y_train)\n",
    "\n",
    "# 使用训练好的模型分别对训练集和测试集进行预测\n",
    "y_test_pred_lr = lr.predict(X_test)\n",
    "y_train_pred_lr = lr.predict(X_train)\n",
    "\n",
    "# 看看各特征的权重系数，系数的绝对值大小可视为该特征的重要性\n",
    "coef_list = lr.coef_.T\n",
    "coef_abs_list = map(abs, coef_list)\n",
    "fs = pd.DataFrame({\"columns\":list(columns), \"abs_coef\":list(coef_abs_list)})\n",
    "fs.sort_values(by=['abs_coef'],ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "数据预处理之后就可以送进模型里面了。线性回归中使用其默认的参数配置，其中选择不做正规化处理的原因是因为已经提前将数据进行标准化了。因为数据集比较小，所以不设置并行处理\n",
    "\n",
    "在训练完获得模型参数后，利用此模型分别对训练集和测试集进行预测，并将训练好的模型各系数列出来，查看各特征对预测结果的贡献程度，此处取了系数的绝对值，绝对值越大，表明此系数对应的特征对预测结果影响越大\n",
    "\n",
    "可以看出体感温度对预测结果影响很大，很重要，而月份和工作日特征对结果影响非常小，在L1正则的时候可能该特征会被舍弃掉"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The r2 score of LinearRegression on test is -0.7058098049719741\n",
      "The r2 score of LinearRegression on train is 0.7598525446307836\n"
     ]
    }
   ],
   "source": [
    "# 使用r2_score评价模型在测试集和训练集上的性能，并输出评估结果\n",
    "#测试集\n",
    "print ('The r2 score of LinearRegression on test is', r2_score(y_test, y_test_pred_lr))\n",
    "#训练集\n",
    "print ('The r2 score of LinearRegression on train is', r2_score(y_train, y_train_pred_lr))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "分别用r2_score来评价模型在测试集和训练集的性能，可以发现：\n",
    "\n",
    "r2_score在训练集中得分较高，说明模型在训练集上的预测效果还行，但在测试集中得分为负，(表示很震惊啊(ΩДΩ)），不过r2_score的定义是（1 - 预测值与真值的差异 / 真值之间的差异），得到负值说明预测值与真值的差异比真值之间的差异大很多，从前期的数据分析可以看出，测试集的真值大致是正态分布，但数值的大小整体上比训练集的真值大很多，所以目前情况是：\n",
    "\n",
    "1、测试集的真值比训练集的真值大很多，且数据的分布形状不同\n",
    "\n",
    "2、训练使用的特征，如季节、月份、假日、星期几、工作日、天气等特征，在训练集和测试集中基本是一样的（我自己分别做了直方图进行对比，但在数据分析里面只体现了训练集的），湿度有些差别。。。\n",
    "\n",
    "所以，特征差不多，使用的模型一样，测试集的预测结果应该和训练集的真值比较接近，但由于第一条的原因，造成了预测结果和测试真值差别很大\n",
    "\n",
    "猜测原因有两种可能：1、2011年或2012年其中一年的数据真值有问题； 2、数据中列出的大多数特征可能对预测结果并没有太大影响，而根据数据分析，注册用户数和租车人数相关性很强，但是作业要求将唯一强相关的特征 注册用户数 剔除了，造成模型没训练好； 3、可能是造成2011年和2012年租车人数不同的一些特征没被作为调查项被包含在数据集中，如2012年时提倡绿色出行造成了相同条件下租车人数增多等等"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAaQAAAEkCAYAAABg/EXtAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAG15JREFUeJzt3X+UHWWd5/H3l9AmSjD8SiCAMYCg/E6wiYkIRqOCgoBn9CArEA6BIMocHcVRdFcyjI66oB5ZkElUBAQFzIgwKjuwkUyEFZiEifwwaCKSNSEmIQw/giJJ+O4ftzp2Ot3pm+7b3U/3fb/O6dP31q1b9X2quvvTT9VzqyIzkSRpoO0w0AVIkgQGkiSpEAaSJKkIBpIkqQgGkiSpCAaSJKkIBpL6XUQ8GhFTB7qOgRQR74uIP0TE+oiY2I/rXR8R+3fx2tkRcU+D1vNERLyjEctS8zCQ1FCd/SHq+IcuMw/NzPndLGd8RGRE7NhHpQ60y4ELM3NkZv5nxxertr9QBcjKiPhaRAzr7Uqr9T3e2+VIfcFAUlMqIOheCzzazTxHZuZI4K3AacA5fV6VNIAMJPW79r2oiJgUEQsj4rmIWB0RX6tmW1B9f6bqJUyJiB0i4r9HxPKIWBMR10fEqHbLPat6bV1E/I8O65kVEXMj4oaIeA44u1r3LyPimYhYFRFXRsQr2i0vI+IjEbE0Ip6PiH+MiAOq9zwXEbe0n79DGzutNSKGR8R6YBjwq4j4XXfbKzOXAfcCE9otf1REfKeqe2VEfKGtBxURr4uIf4+IZyPiqYi4uUObXlc93j0ibq/a8gBwQLv5tuqhRsT8iDi3enxARPy82tZPRcSNEbFLF9uiq30sbcFA0kD7BvCNzHw1tT+It1TTj6u+71IdZvolcHb19TZgf2AkcCVARBwCfBP4EDAWGAXs02FdpwBzgV2AG4FNwN8BewBTgGnARzq85wTgjcBk4O+BOdU6XgMcBpzeRbs6rTUz/1L1eqDWAzqg87f/VUS8ATgWWNZu8nXARuB1wETgXcC51Wv/CNwJ7ArsC/yvLhZ9FfAite11DtvXAwvgS8DewMHUtsesLubtah9LWzCQ1Bd+XPU6nomIZ6gFRVc2AK+LiD0yc31m3reNeT8EfC0zH8/M9cDFwAer/+LfD/xrZt6TmS8Bnwc6Xqjxl5n548x8OTP/nJmLMvO+zNyYmU8As6kdHmvvK5n5XGY+CjwC3Fmt/1ngDmphsL211uvBiHgBWALMp9qOEbEn8G7g45n5QmauAb4OfLB63wZqhwT3zswXM3OrgQpVb+pvgM9Xy3iEWsjVJTOXZeZdVcCuBb7G1tuuzfbsYzUxA0l94dTM3KXti617He3NAA4CHouI/4iIk7Yx797A8nbPlwM7AntWr/2h7YXM/BOwrsP7/9D+SUQcFBE/iYg/Vofx/olab6m91e0e/7mT5yPp3LZqrddR1fJPA94E7FRNfy3QAqxqF/qzgTHV639PrQfzQNRGNHbW8xld1dN+myzvZL5ORcSYiLipOlz4HHADW2+7Ntuzj9XEDCQNqMxcmpmnU/tj+hVgbkTsxNa9G4Anqf0xbjOO2mGr1cAqaoenAIiIVwK7d1xdh+dXA48BB1aHkz5L7Q95I2yr1rplzS3AL6n1+qAWIn8B9mgX/K/OzEOr9/wxM8/LzL2B84Fvtp03amdtVc9rOtTY5oXq+6vaTdur3eMvUdueR1Tb7gy62Hbb2MfSFgwkDaiIOCMiRmfmy8Az1eRN1P5gvkzt/EubHwB/FxH7RcRIaj2amzNzI7VzQ++NiDdXAw3+ge7DZWfgOWB9dZ7mgoY1bNu19sSXgZkRsVdmrqJ2juirEfHqagDFARHxVoCI+EBEtIXzf1ELjk3tF5aZm4AfAbMi4lXVObjp7V5fC6wEzoiIYVUvq/35rp2B9dQGnewDfKqrwrexj6UtGEgaaCcAj1Yjz74BfLA67/En4IvAvdVhqcnANcD3qI3A+z21E/J/C1Cd4/lb4CZqvaXngTXUehJduQj4b9W83wJu3sa826vLWnsiMx8G/p2//uE/C3gF8GtqoTOX2uAEgKOB+6ttejvwscz8fSeLvZDaIcE/AtcC3+3w+nnV+tYBhwL/t91r/0DtkOKzwE+phVtXOt3H226xmlF4gz4NRVWv5Blqh+M6+2MsqTD2kDRkRMR7q8NPO1G7EsLDwBMDW5WkehlIGkpOoTaY4EngQGqHhjwEIA0SHrKTJBXBHpIkqQj9eoHJPfbYI8ePH9+fq5QkDYBFixY9lZmjt+c9/RpI48ePZ+HChf25SknSAIiIuq/80cZDdpKkIhhIkqQiGEiSpCIM9F0zJTWJDRs2sGLFCl580asGDSUjRoxg3333paWlpdfLMpAk9YsVK1aw8847M378eCIadVF1DaTMZN26daxYsYL99tuv18vzkJ2kfvHiiy+y++67G0ZDSESw++67N6zXayBJ6jeG0dDTyH1qIEmSiuA5JEkD4vzzG7u82bO7n2fYsGEcfvjhbNy4kf3224/vfe977LLLLtu9rnPPPZdPfOITHHLIIVtMv/baa1m4cCFXXnnldi8TYOTIkaxfv76ueadOncrll19Oa2vr5mkLFy7k+uuv54orrujR+geaPSRJTeOVr3wlixcv5pFHHmG33Xbjqquu6tFyvv3tb28VRiVobW3t8zDatKnvbvZrIElN4vzzu/9qJlOmTGHlypWbn1922WUcffTRHHHEEVxyySUAvPDCC5x44okceeSRHHbYYdx8c+2mwlOnTt18GbTvfve7HHTQQbz1rW/l3nvv3by8s88+m7lz525+PnLkSADWr1/PtGnTOOqoozj88MO57bbbtqpt1apVHHfccUyYMIHDDjuMX/ziF3W1af78+Zx00kkAzJo1i3POOYepU6ey//77bxFUN9xwA5MmTWLChAmcf/75m0PmggsuoLW1lUMPPXTzNoDaZd8uvfRS3vKWt/DDH/6wrlp6wkN2kprOpk2bmDdvHjNmzADgzjvvZOnSpTzwwANkJieffDILFixg7dq17L333vz0pz8F4Nlnn91iOatWreKSSy5h0aJFjBo1ire97W1MnDhxm+seMWIEt956K69+9at56qmnmDx5MieffPIWgwO+//3vc/zxx/O5z32OTZs28ac//alH7Xzssce4++67ef7553n961/PBRdcwLJly7j55pu59957aWlp4SMf+Qg33ngjZ511Fl/84hfZbbfd2LRpE9OmTeOhhx7iiCOO2Fz3Pffc06M66mUgSWoaf/7zn5kwYQJPPPEEb3zjG3nnO98J1ALpzjvv3Bwm69evZ+nSpRx77LFcdNFFfPrTn+akk07i2GOP3WJ5999/P1OnTmX06NpFrU877TR++9vfbrOGzOSzn/0sCxYsYIcddmDlypWsXr2avfbaa/M8Rx99NOeccw4bNmzg1FNPZcKECT1q74knnsjw4cMZPnw4Y8aMYfXq1cybN49FixZx9NFHb94mY8aMAeCWW25hzpw5bNy4kVWrVvHrX/96cyCddtppPaphe3jITlLTaDuHtHz5cl566aXN55Ayk4svvpjFixezePFili1bxowZMzjooINYtGgRhx9+OBdffDGXXnrpVsvsatjzjjvuyMsvv7x5+S+99BIAN954I2vXrmXRokUsXryYPffcc6vP8Rx33HEsWLCAffbZhzPPPJPrr7++R+0dPnz45sfDhg1j48aNZCbTp0/f3Nbf/OY3zJo1i9///vdcfvnlzJs3j4ceeogTTzxxi7p22mmnHtWwPQwkSU1n1KhRXHHFFVx++eVs2LCB448/nmuuuWbzCLeVK1eyZs0annzySV71qldxxhlncNFFF/Hggw9usZw3velNzJ8/n3Xr1rFhw4Ytzq+MHz+eRYsWAXDbbbexYcMGoHbYb8yYMbS0tHD33XezfPnWd2lYvnw5Y8aM4bzzzmPGjBlbrbc3pk2bxty5c1mzZg0ATz/9NMuXL+e5555jp512YtSoUaxevZo77rijYeusl4fsJA2IeoZp96WJEydy5JFHctNNN3HmmWeyZMkSpkyZAtQGINxwww0sW7aMT33qU+ywww60tLRw9dVXb7GMsWPHMmvWLKZMmcLYsWM56qijNg8QOO+88zjllFOYNGkS06ZN29zD+NCHPsR73/teWltbmTBhAm94wxu2qm3+/PlcdtlltLS0MHLkyC57SCeeeOLma8hNmTKFj370o922+5BDDuELX/gC73rXu3j55ZdpaWnhqquuYvLkyUycOJFDDz2U/fffn2OOOab+jdkgkZn9trLW1tb0Bn3SwKhnFF1fhsSSJUs4+OCD+24FGjCd7duIWJSZrV28pVMespMkFcFAkiQVwUCS1G/68xSB+kcj96mBJKlfjBgxgnXr1hlKQ0jb/ZBGjBjRkOV1O8ouIkYAC4Dh1fxzM/OSiNgPuAnYDXgQODMzX2pIVZKGnH333ZcVK1awdu3agS5FDdR2x9hGqGfY91+At2fm+ohoAe6JiDuATwBfz8ybIuKfgRnA1dtakKTm1dLS0pC7imro6vaQXda0XQ+9pfpK4O1A25UDrwNO7ZMKJUlNoa5zSBExLCIWA2uAu4DfAc9k5sZqlhXAPn1ToiSpGdR1pYbM3ARMiIhdgFuBzj7d1umZyoiYCcwEGDduXA/LlIaugf7AqlSK7Rpll5nPAPOBycAuEdEWaPsCT3bxnjmZ2ZqZrW1XxJUkqaNuAykiRlc9IyLilcA7gCXA3cD7q9mmA1vfZUqSpDrVc8huLHBdRAyjFmC3ZOZPIuLXwE0R8QXgP4Hv9GGdkqQhrttAysyHgK1ugZiZjwOT+qIoSVLz8UoNkqQiGEiSpCJ4gz5J26W7YeoOUVdP2UOSJBXBQJIkFcFAkiQVwUCSJBXBQJIkFcFAkiQVwWHfkjar58rjUl+xhyRJKoKBJEkqgoEkSSqCgSRJKoKBJEkqgoEkSSqCgSRJKoKBJEkqgoEkSSqCgSRJKoKBJEkqgoEkSSqCgSRJKoKBJEkqgrefkIYIbx2hwc4ekiSpCAaSJKkIBpIkqQjdBlJEvCYi7o6IJRHxaER8rJo+KyJWRsTi6us9fV+uJGmoqmdQw0bgk5n5YETsDCyKiLuq176emZf3XXmSpGbRbSBl5ipgVfX4+YhYAuzT14VJkprLdp1DiojxwETg/mrShRHxUERcExG7dvGemRGxMCIWrl27tlfFSpKGrroDKSJGAv8CfDwznwOuBg4AJlDrQX21s/dl5pzMbM3M1tGjRzegZEnSUFRXIEVEC7UwujEzfwSQmaszc1Nmvgx8C5jUd2VKkoa6ekbZBfAdYElmfq3d9LHtZnsf8Ejjy5MkNYt6RtkdA5wJPBwRi6tpnwVOj4gJQAJPAF64RJLUY/WMsrsHiE5e+lnjy5EkNSuv1CBJKoKBJEkqgoEkSSqCgSRJKoKBJEkqgoEkSSqCgSRJKoKBJEkqgoEkSSqCgSRJKoKBJEkqgoEkSSqCgSRJKoKBJEkqgoEkSSqCgSRJKoKBJEkqgoEkSSqCgSRJKoKBJEkqgoEkSSqCgSRJKoKBJEkqwo4DXYCkoeX887ufZ/bsvq9Dg489JElSEQwkSVIRDCRJUhEMJElSEboNpIh4TUTcHRFLIuLRiPhYNX23iLgrIpZW33ft+3IlSUNVPT2kjcAnM/NgYDLw0Yg4BPgMMC8zDwTmVc8lSeqRbgMpM1dl5oPV4+eBJcA+wCnAddVs1wGn9lWRkqShb7vOIUXEeGAicD+wZ2auglpoAWO6eM/MiFgYEQvXrl3bu2olSUNW3YEUESOBfwE+npnP1fu+zJyTma2Z2Tp69Oie1ChJagJ1BVJEtFALoxsz80fV5NURMbZ6fSywpm9KlCQ1g3pG2QXwHWBJZn6t3Uu3A9Orx9OB2xpfniSpWdRzLbtjgDOBhyNicTXts8CXgVsiYgbw/4AP9E2JkqRm0G0gZeY9QHTx8rTGliNJalZeqUGSVARvPyH1sXpuxyDJHpIkqRAGkiSpCAaSJKkIBpIkqQgGkiSpCAaSJKkIDvuWesEh3VLj2EOSJBXBQJIkFcFAkiQVwUCSJBXBQJIkFcFAkiQVwUCSJBXBQJIkFcFAkiQVwUCSJBXBQJIkFcFAkiQVwUCSJBXBQJIkFcHbT0hd8NYSUv+yhyRJKoKBJEkqgoEkSSpCt4EUEddExJqIeKTdtFkRsTIiFldf7+nbMiVJQ109PaRrgRM6mf71zJxQff2ssWVJkppNt4GUmQuAp/uhFklSE+vNsO8LI+IsYCHwycz8r85mioiZwEyAcePG9WJ1kvRX9QzLnz277+tQ4/R0UMPVwAHABGAV8NWuZszMOZnZmpmto0eP7uHqJElDXY8CKTNXZ+amzHwZ+BYwqbFlSZKaTY8CKSLGtnv6PuCRruaVJKke3Z5DiogfAFOBPSJiBXAJMDUiJgAJPAF4kRVJUq90G0iZeXonk7/TB7VIkpqYV2qQJBXBQJIkFcHbT0iDgLfCUDOwhyRJKoKBJEkqgoEkSSqCgSRJKoKBJEkqgoEkSSqCgSRJKoKBJEkqgoEkSSqCgSRJKoKBJEkqgoEkSSqCgSRJKoJX+5bU77q7evns2f1Th8piD0mSVAQDSZJUBANJklQEA0mSVAQDSZJUBANJklQEh31rUHLYsOrhz8ngYg9JklQEA0mSVAQDSZJUBANJklSEbgMpIq6JiDUR8Ui7abtFxF0RsbT6vmvflilJGurq6SFdC5zQYdpngHmZeSAwr3ouSVKPdRtImbkAeLrD5FOA66rH1wGnNrguSVKT6ennkPbMzFUAmbkqIsZ0NWNEzARmAowbN66Hq5PUTLr7/JCGpj4f1JCZczKzNTNbR48e3derkyQNUj0NpNURMRag+r6mcSVJkppRTwPpdmB69Xg6cFtjypEkNat6hn3/APgl8PqIWBERM4AvA++MiKXAO6vnkiT1WLeDGjLz9C5emtbgWiRJTcwrNUiSiuDtJ9S0HFoslcUekiSpCAaSJKkIBpIkqQgGkiSpCAaSJKkIBpIkqQgO+1ZxHI6tktTz8zh7dt/X0QzsIUmSimAgSZKKYCBJkopgIEmSimAgSZKKYCBJkopgIEmSimAgSZKKYCBJkopgIEmSimAgSZKKYCBJkopgIEmSimAgSZKK4O0nNCR5Cwtp8LGHJEkqgoEkSSqCgSRJKkKvziFFxBPA88AmYGNmtjaiKElS82nEoIa3ZeZTDViOJKmJechOklSE3vaQErgzIhKYnZlzOs4QETOBmQDjxo3r5eokqTzdfcxg9uz+qWOw620P6ZjMPAp4N/DRiDiu4wyZOSczWzOzdfTo0b1cnSRpqOpVIGXmk9X3NcCtwKRGFCVJaj49DqSI2Ckidm57DLwLeKRRhUmSmktvziHtCdwaEW3L+X5m/u+GVCVJajo9DqTMfBw4soG1SJKamMO+JUlFMJAkSUXw9hOqWyNu6eDnMVQSb1NSFntIkqQiGEiSpCIYSJKkIhhIkqQiGEiSpCIYSJKkIgy6Yd/1DNN0aHG5HGYrqSv2kCRJRTCQJElFMJAkSUUwkCRJRTCQJElFMJAkSUUYdMO+hxqvoC0Nff6e18cekiSpCAaSJKkIBpIkqQgGkiSpCAaSJKkIBpIkqQgO+24SXmVbEnT/t2Agh5fbQ5IkFcFAkiQVwUCSJBXBQJIkFaFXgRQRJ0TEbyJiWUR8plFFSZKaT48DKSKGAVcB7wYOAU6PiEMaVZgkqbn0poc0CViWmY9n5kvATcApjSlLktRsIjN79saI9wMnZOa51fMzgTdl5oUd5psJzKyevh74TTeL3gN4qkdFlWmotQds02Bhm8o31NoDf23TazNz9Pa8sTcfjI1Opm2Vbpk5B5hT90IjFmZmay/qKspQaw/YpsHCNpVvqLUHetem3hyyWwG8pt3zfYEne7E8SVIT600g/QdwYETsFxGvAD4I3N6YsiRJzabHh+wyc2NEXAj8GzAMuCYzH21ATXUf3hskhlp7wDYNFrapfEOtPdCLNvV4UIMkSY3klRokSUUwkCRJRRjwQIqIyyLisYh4KCJujYhduphvUFymKCI+EBGPRsTLEdHl0MeIeCIiHo6IxRGxsD9r3F7b0aZBsY8AImK3iLgrIpZW33ftYr5N1T5aHBHFDdrpbptHxPCIuLl6/f6IGN//VW6fOtp0dkSsbbdfzh2IOusVEddExJqIeKSL1yMirqja+1BEHNXfNW6vOto0NSKebbePPl/XgjNzQL+AdwE7Vo+/Anylk3mGAb8D9gdeAfwKOGSga++iPQdT+wDwfKB1G/M9Aewx0PU2qk2DaR9V9f5P4DPV48909nNXvbZ+oGvdRhu63ebAR4B/rh5/ELh5oOtuQJvOBq4c6Fq3o03HAUcBj3Tx+nuAO6h9tnMycP9A19yANk0FfrK9yx3wHlJm3pmZG6un91H7PFNHg+YyRZm5JDO7uxrFoFJnmwbNPqqcAlxXPb4OOHUAa+mperZ5+3bOBaZFRGcfai/FYPs56lZmLgCe3sYspwDXZ819wC4RMbZ/quuZOtrUIwMeSB2cQ+0/hY72Af7Q7vmKatpglsCdEbGourzSYDfY9tGembkKoPo+pov5RkTEwoi4LyJKC616tvnmeap//J4Fdu+X6nqm3p+jv6kOb82NiNd08vpgMth+d+o1JSJ+FRF3RMSh9byhN5cOqltE/B9gr05e+lxm3lbN8zlgI3BjZ4voZNqAjVevpz11OCYzn4yIMcBdEfFY9V/HgGhAm4raR7DtNm3HYsZV+2l/4OcR8XBm/q4xFfZaPdu8uP3SjXrq/VfgB5n5l4j4MLUe4Nv7vLK+M9j2UT0epHYtu/UR8R7gx8CB3b2pXwIpM9+xrdcjYjpwEjAtqwOQHRR1maLu2lPnMp6svq+JiFupHaoYsEBqQJuK2kew7TZFxOqIGJuZq6rDI2u6WEbbfno8IuYDE6md4yhBPdu8bZ4VEbEjMIo+ONTSQN22KTPXtXv6LWrnngez4n53eiszn2v3+GcR8c2I2CMzt3kh2QE/ZBcRJwCfBk7OzD91MduQukxRROwUETu3PaY2sKPT0SqDyGDbR7cD06vH04GteoERsWtEDK8e7wEcA/y63yrsXj3bvH073w/8vIt/+krRbZs6nF85GVjSj/X1hduBs6rRdpOBZ9sOJw9WEbFX27nKiJhELWvWbftdFDHKbhm146eLq6+2EUF7Az9rN997gN9S++/0cwNd9zba8z5q//H8BVgN/FvH9lAbQfSr6uvRkttTb5sG0z6qat0dmAcsrb7vVk1vBb5dPX4z8HC1nx4GZgx03Z20Y6ttDlxK7R88gBHAD6vfsweA/Qe65ga06UvV782vgLuBNwx0zd205wfAKmBD9Xs0A/gw8OHq9aB2s9PfVT9nXY7OLeWrjjZd2G4f3Qe8uZ7leukgSVIRBvyQnSRJYCBJkgphIEmSimAgSZKKYCBJkopgIEmSimAgSZKK8P8B9RQ0b8LoJm8AAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#在训练集上观察预测残差的分布，看是否符合模型假设：噪声为0均值的高斯噪声\n",
    "f, ax = plt.subplots(figsize=(6, 4)) \n",
    "f.tight_layout() \n",
    "ax.hist(y_train - y_train_pred_lr,bins=40, label='Residuals Linear', color='b', alpha=.6); \n",
    "ax.set_title(\"Histogram of Residuals\") \n",
    "ax.legend(loc='best');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上图是在训练集上的预测残差分布，整体还是有点像均值为0的正态分布的，说明模型训练的还行，但是在0均值附近有一列凹进去的地方，说明预测值约等于真值的有一部分数据丢失了，之前去除了一些离群点，尝试把离群点加回去，发现凹进去的地方有所缓和，说明移除的离群点对模型的训练是有影响的，可能是因为我去除的离群点属于连续型的特征，个人认为连续型的特征（如湿度、风度等）与结果的相关性比类别型特征（如月份、星期几）与结果的相关性大一点，可能离群点去除的有点多了。另一方面，也有可能是因为样本数据太少了，总共只有350个左右样本，还不足以训练出一个较稳定的模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAARgAAAEYCAYAAACHjumMAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl8lPW1+PHPyWQCE7YARpSwiGJxQ4nGpdW2AgJWREFUcKm23KttrVUrchW1ol1woRVt9bb6q7UuCIILBREVS6gtLVpAUKnLLVrRoCECYUsgy5zfHzMTh2SWZ5KZeWY579crLzNLnucEmcN3PV9RVYwxJhUK3A7AGJO7LMEYY1LGEowxJmUswRhjUsYSjDEmZSzBGGNSxhKMMSZlLMEYY1LGEowxJmUK3Q4gEQcccIAecsghbodhTN5bs2bNF6paGu99WZVgDjnkEFavXu12GMbkPRH52Mn7rItkjEkZSzDGmJSxBGOMSRlLMMaYlLEEY4xJGdcSjIh0FpE3RGS9iGwQkTvcisUYkxpuTlPvA0ao6m4R8QJ/E5GlqrrKxZiMMUnkWgtGA3YHH3qDX1a/0xiX+P3+pF/T1TEYEfGIyDpgC7BMVV+P8J4rRWS1iKyuqalJf5DG5IF169bxjW98gx07diT1uq4mGFVtVtVhQD/gJBE5JsJ7HlbVClWtKC2NuzLZGNMOxx13HFOmTMHr9Sb1uhkxi6SqtcAK4EyXQzEmryxatIh33nkHEWHKlCkUFxcn9fpuziKVikhJ8HsfcAbwnlvxGJNvFi9ezPnnn8/06dNTdg83Z5EOBh4TEQ+BRDdfVV9wMR5j8sbSpUs5//zzOe6443jyySdTdh/XEoyqvgWUu3V/Y/LVsmXLmDBhAkcffTSvvPIKPXr0SNm9MmIMxhiTPg888ABDhgxh2bJl9OzZM6X3yqp6MMaY9lNVRIR58+axZ88eevfunfJ7WgvGmDzw97//nVGjRlFbW4vP5+OAAw5Iy30twRiT49544w3OPPNMNm3aRF1dXVrvbQnGmBy2Zs0aRo8eTWlpKcuXL6dv375pvb8lGGNy1Pr16xk1ahQlJSUsX76cfv36pT0GSzDG5Kju3btzzDHHUFlZycCBA12JwWaRjMkxVVVVHHzwwQwaNIi//OUviIhrsVgLxpgc8sEHH1BRUdGy/N/N5AKWYIzJGRs3bmTEiBE0Nzdz+eWXux0OYF0kY3LCRx99xPDhw9m7dy+VlZUcddRRbocEWIIxJus1NTVx1llnsXv3bpYvX87QoUPdDqmFJRhjslxhYSH3338/vXv3ZtiwYW6Hsx8bgzEmS3322Wc888wzAIwePZoTTjjB5YjashaMMVmourqaESNGsHnzZk4//fS07S1KlCUYY7JMTU0NI0eOZNOmTbz00ksZm1zAEowxWWXbtm2MGjWKDz/8kCVLlvD1r3/d7ZBisgRjTBZ59tlnee+991i8eDHDhw93O5y4LMEYk0WuuOIKRo4cyaGHHup2KI7YLJIxGW7Xrl2MGzeOtWvXAmRNcgFLMMZktD179jB27FiWLl3Kpk2b3A4nYdZFMiZD1dXVcfbZZ7Ny5UrmzZvH+PHj3Q4pYdaCMSYD7d27l3PPPZfXXnuNJ554ggsuuMDtkNrFEowxGaq4uJhHH32Uiy++2O1Q2s26SMZkkIaGBurq6igpKWHhwoWu13PpKGvBGJMhGhsbmTx5MiNGjKChoSHrkwu4mGBEpL+IVIrIuyKyQUSudSsWY9zW1NTEJZdcwvPPP8+UKVMoKipyO6SkcLOL1ARMVdW1ItINWCMiy1T1Xy7GZEzaNTc3c9lll7FgwQLuvfderr76ardDShrXWjCq+pmqrg1+vwt4FyhzKx5j3HLzzTczd+5c7r77bn784x+7HU5SZcQgr4gcApQDr7sbiTHpd/XVVzNw4ECuuuoqt0NJOtcHeUWkK/AscJ2q7ozw+pUislpEVtfU1KQ/QGNSQFV54oknaG5upn///jmZXMDlFoyIeAkklzmq+lyk96jqw8DDABUVFZrG8EwSLXyzilkvv8/m2nr6lviYNmYI48vzs0esqvzoRz/iwQcfpHPnzlm7iM4J1xKMBObgHgHeVdV73YrDpN7CN6uY/tzb1Dc2A1BVW8/0594GyLsko6pcf/31PPjgg9xwww2cf/75boeUUm52kU4Fvg2MEJF1wa+zXIzHpMisl99vSS4h9Y3NzHr5fZcicoeqcuONN3Lfffdx7bXXcs899+TEWpdYXGvBqOrfgNz+0zUAbK6tT+j5XLVx40YeeOABrrrqKmbPnp3zyQUyZBbJ5K6Fb1ZRIEKzth0+61vicyEi9wwePJg1a9YwZMiQvEgukAGzSCZ3hcZeIiUXn9fDtDFDXIgq/WbOnMlDDz0EwJFHHklBQf587PLnNzVpF2nsBcAjwp3nDc2LAd577rmHW265hb/97W9ohESb6yzBmJSJNsbiV82L5DJ79mxuvPFGJk+ezKOPPpo33aJwNgZjUqZviY+qCEnGydhLtq+beeCBB7j++uuZOHEijz/+OIWF+flRsxaMSZlpY4bg83r2e87J2Eto7Kaqth7ly3UzC9+sSmG0yVVfX8+5557LU089hdfrdTsc10g29QsrKip09erVbodhEtCelsipdy2P2PIpK/Gx8qYRqQo1KbZv307Pnj0B8Pv9OTugKyJrVLUi3vvys91m0mZ8eVnMhBIpAWXrupnHH3+c6667jhUrVnDsscfmbHJJhP0JGNdE6wr18EXuUmTyupmnnnqK7373u5SXl3P44Ye7HU7GsARjXBNtC4EI7Rq7ccuCBQu47LLLOO2001i0aBE+X+YmwnSzBGNcE63LU1vXyJ3nDaWsxIcQGHvJ1HUzK1eu5OKLL+aUU05hyZIldOnSxe2QMoqNwRjXxJrGjjd2E0+6prlPPPFEpk+fzg033EDXrl2Tfv1sZ7NIxjWtyzhAoCvUntZKKKFU1dZTIOCP8Ne6Z7GXGeOOTkqiWbFiBUcffTSlpaUdvlY2cjqLZF0k45rx5WVJ6QqFDxZD5OQCsL2uMSnraV599VXOPPNMrrvuug5dJx9YC8ZkvfKfvsL2ukbH7y/xeVk3Y3S77rVixQrOOussBg8eTGVlJb17927XdbKdtWBMXlj4ZlVCyQWgtr6RYXe8knBL5q9//Stjx45l0KBBvPrqq3mbXBJhg7wmIzkdpG1vVbza+saYZTtb3/+G0V9h5vXX079/f26470km/OGdrN0nlU7WRTIZZeGbVdyxeEObVkm0wd9BNy0hGX+DRUA1MA40/IhSnl1T1WbwedrXDwSUWX+tScrAdDazLpLJOqHB2khdnmg1fEuKo28k9IhQ7HX2Vzz072xVbT1zVm1qSSD7Pv83W19+gLp9DTyytpZH1u6w+sIJsC6SyRjRClSFRFqYF6sBftHJ/akY2Itpz6ynsdl5Oyf0zoYtH7Ll6Z8gRT6a63awucAT9WcyfZ+UWyzBmIwR70MaaS/SjvroA7xP//MTAJoSSC4hDTX/oXrerYi3E30umklh114t929vjZt8ZAnGZIxoK3vhy71IrQdfe/i81EZJMo3NylOvb0p4jKbxi08CycVTSJ+LZuItOWi/vVCRFgdm6j4pt9kYjMkYkQpUQWDdyp3nDQVos/t6T0NTzGtGW3QXS2Hjbkp69ODYK++lqGffNgsAO4eN64Riy6cB3kTYLJLJKNGmpxe+WcXU+esjnlAQbWtAovwN9XiKfFxyygBuP/tICgsL94unh8/Lnoam/cZzfF4PE08oo/K9mryatraCUyYrhT6YoQ/1rJffZ/XH23h2TVXE5AKB5OItEBo7kGWadlTz+VM30eOUC6gsmcjPxw9ts1cqUlesvrGZOau+7Ibl87G4kVgXyWSUSEWowqeNIykr8THrguPafUxo084tfD73ZrShnk59j6Cqtp5T71rOHYs3xLxvSOu0ZtPWX7IEYzJKpKnqWO2S0ABre1sLTbu+oHruLfj37ubAC39GUZ9DgUBiS3QLQjibtg5wNcGIyB9EZIuIvONmHCZzJPLBbH2AW6JTxf7GfVTPu5Xmulr6XPhTOh2cvFKXNm0d4HYL5o/AmS7HYDJItHq8rfm8Hn514XH7tVyizUJFU+DtRPcTxnHgBXfQqW/7ppmjLRQefkR+1olpzdVBXlV9TUQOcTMGk1miHX7YpchDSXFRzJma1gPEfUt87NnX1GZwtrluB021n9Op7xC6HT+2Q/E2+iM/X/leTYeumysyfhZJRK4ErgQYMGCAy9GYVKuNMu6xp6GZkmKYPWlYzPGWUKnN0PRybX0jwpfjOM31u6h++laad2+j7HuPUFDUOfm/BDYGE+J2FykuVX1YVStUtSJfyxPmk1hjF05PeGxd4S6UXPx7d7Nl/k9o3PopB5w9NWXJBWwMJiTjE4zJL/HGUZxMAUeaifLv20P1/Nto2PIfDpxwM75Bxycl3khs68CX4naRRKSTqu6L95wxiYpVVCraql2I3P0Iv1akn9q5ehEN1RspHX8zvsNOTOavsZ9kFhbPBU5aMP9w+FzCRGRu8FpDRORTEfmvZFzXZL5YB9yPLy/DH2MLS+vuR+trRdLjqxdy0MV3U3z4ycn7JSLYG23UN09FTTAicpCInAD4RKRcRI4Pfp0OFCfj5qp6kaoerKpeVe2nqo8k47om80U71THU/Yk2hiHQpvsRrY6Mv3EvW1/6DU27tyEFHjqVHZGc4GOwVbz7i9VFGgN8B+gH3Bv2/C7g5hTGZPJAvAPup40Z0qYsggCXnDKgTfcjYiGqpgZqnvsFe/+zDt+gEygc8rXkBR+HzSB9KWqCUdXHgMdEZKKqPpvGmEweiHWqI0Re0xJtS0Dra2lTI1uCyaX3WddSnMbkEorHBDhZB/OCiFwMHBL+flX9aaqCMrkvUgul9eyL0+Njp40ZwnVPrwNAmxupWTiTvR+todeZP6Lr0DOSH3wMNoO0PycJ5k/ADmANYDNHJimctFBazzINP6K0pe5KSbEX1UDJzB4+b8tiOv++Opp2bKHX6KvodtyYhOPyiESdvYpFIG9qwSQibsEpEXlHVY9JUzwxWcGp/BHp3OpY1N8MqoinEG1qRAqd7WlqrUuRhz0Nzu4Z4hFpsy8q1yXz2JK/i8jQJMRkjGPxThgIp/5mvnjhXmoW3Y2qv93JpUBIOLkANKsm5czrXOQkwZwGrBGR90XkLRF5W0TeSnVgJr85nYlRfzNbl95P3bt/odPBX0GkfYvTO1p206anI3MyBvOtlEdh8k68o2FjnTAQoupn60sPsOed5fQ47RJ6nHJBu+NJRk1fm55uy0m6PxjYpqofq+rHwDbgoNSGZXJZrFW8IdPGDIlbAnN75R/Y8/Yyenx1EiWnXpTSmJ2w6em2nCSY3wK7wx7vCT5n8tjCN6s49a7lDLppCafetTyh8Yd4q3ghMMsUr1HR5YivB1ouX780kdBTwqanI3OSYETDpppU1U8W1JExqeOkBRJLvFW8IWURWgSqyt5NgSHATn2HUHLqRUi0KlVJ5PN6uPSUAZSV+BACmxpLgtPjrc9NMl9ykig+FJFr+LLVchXwYepCMpkuWgtk6vz1/PjpdXHXg8RbxRvSejGeqlL7lz+y8/VnOfDCn+EbVJ6k3yg22yHdfk5aMN8HvgZUAZ8CJxOsMGdyg9PuTuh90QZfm1VbWjTTFqyPep1INV8k+HPh9x9fXsad5w2lJFind8dfn2Tn68/StfwsOh8yrH2/bDsUFxVacmmnuC0YVd0CTE5DLMYFrRe0RTs4LNGFb41+5fZFGyJ+MMNX8VbV1u9X0rKqtp7rnl7HHYs3MGPc0QDsaWiiduVcdvzjaboeO5peo76flm5RiM0OtZ+NpeS5WAOu4ckhkYVvIdEOpW8t0mDu9rrGlv1FDdUb2fG3OXQ55gx6nXl1u9e6tJfNDrWfJZg853TANZn/iifaGirqcxgHTv4Fnfsfk/bkYrNDHWM1efNctH+dWz8f7X1lJT56Fkdemh/teaetoV1rX2Dvx4EZI9/A45AC52cetY6xPR0qmx3quKgtGBG5PtYPquq9sV432cFJ2QQn75v2zHoam7/s7Hg90jKGEhJavRtvhS7ArrVL2Lbsd3Q5ejidBx7brt8NAv+CThszxPF9QwRYedOIdt/XBMTqInUL/ncIcCKwKPh4HPBaKoPKN/GWzaeS08JOTt4Xr/SC027RrnUvsW3Zb/ENPpne37qmQ7+fH1j98TbqGpoS+jmnJ0ya2JyUa3gFmKiqu4KPuwELVDXtR77mYrmGSB88n9eTE03z8MRZEKfOirdA8HqELWteZuvS+/EdWkHphFvavTO6o3oWe3nzttGu3DsbOC3X4GSQdwDQEPa4gUB1O5METmdxskF4Qikp9rJ7bxONwV2E8Yo4NfqVRr+y95MNdD6knNIJN7uWXCD6CZMmMU4SzBPAGyLyPIEZxQnA4ymNKo84ncXJdK1bYtsT/IBqcxPiKaT3t34E/maksCgVYTpmU9PJEXcWSVV/AXwX2A7UAt9V1ZmpDixfOJ3FyXTtWScTsuf9lWz+w9U07fwCKfAknFzKSnwR9y11hE1NJ4fTaepiYKeq3g98KiKDUhhTXom0bD6T1l443UbgtMXVegFu3f+t4otF9+Dxdaegc5eE4wv9WcU7crY1b0H0iesSnzdi97QjO8jzlZOjY2cAFQRmkx4FvMCTwKmpDS0/JHI8R7o53UYAzgpEeQoEf1hlp7qN/6Rm4V0U9RnMgRfcTkFR4q2Q1oPhTgeVCz3SMj4Uzuf1cPs5R7d5PpE/C/MlJ7NI64ByYK2qlgefe0tV2784oZ1ycRYpk0Xb2FhW4muzRiTR1bl7N71N9fyfUFQ6iD6TfkZB564JxxcpjpBBNy2JW08mJLQXqixGck/kzyIfJHMWqUFVVUQ0eOHE27FRiMiZwP2AB/i9qt6VrGubjktkADr0obxj8QZHA7ze0oF0OfIb9BxxRbuSS7xupJMWVUgoucRKFLkyGJ9uThLMfBF5CCgRkSuAKcDvO3pjEfEADwKjCJSB+KeILFLVf3X02rnIjcV40T6kBSIth9S3Fu/w94bqjXh7D8Dj684BY2MuFm9R4vNSW9/YcmZRqKUBX7YsWr82bcyQNquLY4mXKJzWsDH7i9tFAhCRUcBoAq3Jl1V1WYdvLPJV4HZVHRN8PB1AVe+M9jP52kVyazFerG6Pt0AoKixoOeYjVLMl1g7qvZ9uYMv8GXQ9dhS9zvie4zgi/a4xY/MIhQVCfZxkFy5eCyaXF0S2R9K6SCJyt6reCCyL8FxHlAGfhD0OFbNqff8rCRa4GjBgQAdvmZ3cWowXuvbU+evbDJg2+pXGsDOE4pVm2Ff1LlsW3I6n2wH0OOXChOII/a6rP97G3Nc/ib9or1mjtlwKJHBQWvgAr5NZu0wejM9kTqapR0V4LhlHmUSaJ2zzt0JVH1bVClWtKC0tTcJts4+b/f/x5WXtOko13L7PPqB6/gw8XUroM/kXeLr2TPgaVbX1PLlqU4djUYVZFxzXssM6kR3T48vLWHnTCD66aywrbxphycWBWLupf0Cg/u5hrQ5a6wb8PQn3/hToH/a4H7A5CdfNOe3t/ydj3Gbhm1X7VZxLlPqbg+tcutFn8kwKu/Vu55WSoyC4ECcfZ37cEKuL9BSwFLgTuCns+V2qui0J9/4ncHhw0V4VgbKcFyfhujnHaUmFcImu27jk//2DlRu//N9a5BEam5UCkYSSS5ciD3UNzS0/IwUeSsffTEHnLhR2b18L1Of1tHuVcGuhY17B1q+kQ9QukqruUNX/EJhGDj94rVFE2oyVJEpVm4CrgZeBd4H5qrqho9fNRaHi14k0652cPRTSOrkANDQHCngn0iUpEPB6ClCgoeZjdr7xPABFfQ6lsEcfx9dpLVnJJfx6dsxrejiZpv4tcHzY4z0RnmsXVX0ReLGj18kH48vL4v6LG94lipYWIo3btE4uTnQp8rQ5KN4jQm19I41bP6H66VsQKaDL0JF4fN0Tvn6q2fqV9HCSYNocvCYiVss3wzhdSZuMdRsCbZILBGaW/Ns3Uz3vFoDAgG4GJhew9Svp4mQW6UMRuUZEvMGva7GD1zLOHYs3xE0uHdlE6QkOjsYa8G3c/hmbn5qO+pvpM+kXeHv3j/LO4DVjbDiMRoD7Jg1riSfS606ukSmbSXOdHbyWAxa+WRVzeX68cZsiT+yPpc/r4VcXBqZ2Y43INHz2Aepvos+kn1NUOjBu3AUEKscJgYV63jhxQKCU5bQFbdflhOK8JOx412gUG+BNFzt4LQfEGrB0shkv1nL6Ep+X288JHJsaOqeoNfU3IwUeuh71TXyHVVDQydl2tUa/UlxU2FKa8taFbzNn1abYSaypOeIuaKHtzupYGxRNekRtwYjI/wT/+xsR+XXrr/SFaOKJNWDppCsQazyiS6fAsamh9TCtNe36gs8evYbi6rdRcJxcQsJjr3yvJmZy6VnspS7K8v9IrZJMr7WTD2J1kd4N/nc1sCbCl8kQ0RJEaH9QvCJJsT5woQQw6+X323z4m3Zvo3reLRTWb+Xy04+KOi7iNPZYidLn9bQ5BiWe9kzvm+SK2kVS1cXB/z6WvnCME61X6A4/opRn11S1WYh39nEHO1psN768LGqZBSVyV6N5z3a2zLuF5l1b+elDc/njv4to1raDzLEGhb0e2S+5RVux7BFpSQzR4ox2yJuT6X2TOrG6SItFZFG0r3QGab4Umo6uCq51qaqt59k1VUw8oazNv9SV79U4Xmw39tiDo96z9Yfev28P1fNupWnHFg48fwa//6Ao6gzW1w7rFXXAtUtR4X4f/mhdml9deFzL+2aMO7rNYHCkQ95MZog1yPvL4H/PAw4iUCYT4CLgPymMycQQbYVu5Xs1bQZzfxxlUDZSV+SF9Z85jkGKfHQeeCy+kVfQecDQmKt9/75xW9QWzI5WO7Cd7Fi2Xc3ZJVYX6S8AIvIzVf1G2EuLRcROdnRJIjurnW6SXPhmVdxyCwD+vbvxN9RR2P1Ax/VcYg3aRho7ctKlsW5P9nCyDqZURA4NPQhuTszPugkZIJFjTpzOojjZl+Pft4fq+bdRPe9WtLnjh5LZYrf84GTJ/4+BFSISWr17COC8HJlJqkR2VjvtTsTbl+PfV8eW+TNoqN5I6fjpiCexExdbD/QKcMkpA9pVOsK6RtnFyUK7l0TkcOCI4FPvqeq+1IZlokl0DGL1x9v4fMdeFPh8x15Wf7wtoSNH/A172fLMHez77AMOOPdGig8/JaF4fV4PE08oo/K9mg7XpbFjQ7KPk2NLioHrgYGqekUw2QxR1RfSEWC4fK3J2163LnybJ1dtavO8z1vAnecd2/LBjLVRcturD7Fr7RIOGHcDXY78xn6viQQqxEXjEdlvBqgj7NiQzOK0Jq+TMZhHCRx4/9Xg40+Bn3cgNpMmc1//JOLz9Y1+rnt6HeU/faXldICJJ0ROAiWnXULpxJ+0SS4QO7kA+FWT1rqwY0Oyk5MxmMNUdZKIXASgqvUi7ViymWcyYbwgXrGo7XWNTFuwngWrN/H3sJow2tTIjlUL6H7yRAo6d6X4sBPbdf9klkSwY0Oyk5MWTIOI+AiO04nIYYCNwcQQaTHc9OfeTvtZxk6W7jf6lZVha1W0uZGaP93JjpVPsffjyOtonEj2LJHtK8pOThLMDOAloL+IzAH+DPxPSqPKcomUq+yoWAeyX3Ry7HosrWlzEzWL7qH+32/Qa9QPKB58Mj6vh1NjrMaNei2SO/hq+4qyU8wuUrAr9B6B1bynEPiH6VpV/SINsWUtJ+MFyar4H2tm5efjhwJEHOhtTf3NfLH4l9R/8A96jrySbseP3a8EQni8JcVedu9tilg2ISQVJRFsgV32iZlggmdSL1TVE4AlaYop68UbL0jWlGu0ltIdize0JIMePmdrVpp21rB301v0HD6F7hXnAFBQINy+aAM/fnpdmyQYSjhVtfVt1rlY18WEOJmmfhD4o6r+Mz0hRZct09Txjhnt6JRr+Ie7owTFI0KTQnPdDjzFPaK+N9pRqZkwoG3SK2lHxwLDge+LyH8InCggBBo3x3YsxNwVbzFcR6ZcF75ZldCh7rGo+ilZ+yhdSw7gk0PPiZlcIPpxtdZ1MdE4STDJOCY278T60HVkyvWOxRuSlFyUba/8lk3rllLytUl0PzT+z0Db0g3GxBKrHkxnEbkOmAacCVSFDl8LHsBm2qkjU66xins7paps//PD7F63lO4nT6T7aZfidGmTQNqn2032ijVN/RhQAbxNoBXzq7RElAfaO+WarA927YpH2bVmMd0qzqXkm99xnFwgMJhrpyIap2J1kY5S1aEAIvII8EZ6QsodsQY/Ex23CA0cd0Rotqeoz6F0qziXniP+O6HkEmLL841TsVowLW3x4DnSSSMiF4jIBhHxi0jckehslOzVvLcvin+wWizeAsGzK1C1rstRp9Nr5BVxk0u0V215vnEqVgvmOBHZGfxeAF/wcWgWqSNngr5DYPHeQx24RkaLt5o3fNGaaqB8ZLQpXqcV56Ip8Xn5ZPkTbPvbPA769i/pdNDgqO/1iOBXjVlM3Na4GKdilcz0RHuto1T1XaBdzfNsEa0bEWrJhD604YO20RbcdWTMQ4Dt/1jAtteepMsxIynqE3u6yK/KR3eNbXlcMbCXrXEx7Zbxh9iLyJUEj6odMGCAy9HsL9YYS6wjOGJ1deobm7nu6XXMevn9lut1ZGpY317Mppd/T/FR36T3t65BJPb2s9bdH1vjYjrCyWbHdhGRV0XknQhf5yZyHVV9WFUrVLWitDRzSgHHG2OJNhUdr4RCSOh6ty5s/8Bu4ydv8fGLD1F8xNc5YOz1SEHsRql1f0yypawFo6pnpOramSDWGEv4v/qtWziJLPGvb2xmjoONipH0LPZy29RvM23fNhoHnRY3uYSfQW1MsqSsBZPrnCz3H19exrQxQ+hb4mNzbT2zXn6f4UeUtmnZxJLoml2PCD8/ZjsLLj6UCcf3o3nw6Ygn/r8joTOojUkmVxKMiEwQkU8JlOFcIiIvuxFHRzg5PiRSN2rOqk0cP6BHSsqLTqK4AAAOo0lEQVQZAOx4axmXXXYZM2fOBHC8m9rWtphUcCXBqOrzqtpPVTupah9VHeNGHB3hZLl/pG6UEjjtcNqYIQkXcYpn94ZKtr54P90PO54/9xpH+U9faXN6YjS2tsWkgnWR2snJcv9orYLQcvtkfqj3vPsaW5fMxjdwKN3PmY4UFrG9rtFRF8sGd02qZPw0dSaLN4Ub67yhzbX1zJ40LOpxIU4J4Fc/DW+/RJcBR9HrvNso8HaO+3PhC+psbYtJFUswKTRtzBB+/PS6iK2IviW+lg/1HYs3tHuXtF+Vfj27cNfj87nx2bcoKIqfXKIVjjIm2ayLlELjy8u45JQBbcZawrsk48vLePO20dw3aVjC16/f+E+2LJjBJ1u2M/2Ff1PQqTjuz3hELLmYtLEEk2Stq/xXDOzF7EnD4pZmGF9eltDMUv1Ha9ny/Ez8dTvA72wvqteTvJMWjXHCukhJFK2Y953nDXVUazfSwfaR1H+8nprnfo63dz8OnPRzCjp3dRRfYYFYcjFpZS2YJOroeUihmamSGGtX9n7yDjXP/pTCkoPpM+nneHzdHMdX3+i3anQmrSzBJFEyzk8eX15Gl07RG5YeXw869R1Cn8m/iFukOxKrRmfSybpISdTD541Yt8XJatrwndmRZp2admzB070U7wH96TN5ZrtjtBW7Jp0swSRRtPI2rZ9vXeYhUmGncPs++z+q591Cj69eSI9Tzm/zes9ir+Npbluxa9LJukhJVBvlQx7+fLT9SdGSS0P1RrY8fSseXze6HPXNiO8JTXPHGrsBW7Fr0s8STBJFax0UiLQMrkbbnxRJw5aPqJ53K9KpmD4XzaSwe/R6OOPLy1g3I5BoQlPiPYu9lPi8dli8cY11kZIo2jRzs2pLKUynYyD+xr1sWTADKSyiz+SZFPboE/F9PYv3b7VYBTqTSSzBJFHogz11/vo2letC09Wx9ieFK/B2ptfoH9K5tD+ekoOjtnIcFsgzxhXWRUqy8eVl+KN86jfX1scdA2ncvpm6/3sdgOLDT6agpG/Mgdna+saWVcO2xsVkGmvBpEB7z55urP2c6rk3g/rpPPC4lo2LVbX1LYemRRJeExiwLpLJGNaCSYFYxaiiLXRr2rGF6rnT0cZ9HHjB7fvtivaIOKrrksiqYWPSwVowKdC64HcPnxcRopZuaNpZQ/Xc6XTWBkovmYkeMKjlNZ/Xk1C9GFtIZzKJtWBSZHx5GStvGsHsScPY1+SPWV1uz7/+gu7dTeWflzH7hxPa7LxOZJe1LaQzmcRaMCkWad1La91PnsilF0/mxBNP5EQij6E42WVtC+lMprEWTIpF67I076mlet4tNH7xCSLC4g+bos4Ehdf/jcYKSZlMZAkmxSJ1WZrrdlD99K3sq3qP5rraludbnw4ZLtTlum/SsIgDyFZIymQiSzAp1npGqbl+F9VP/4Sm7ZspnfgTOg8Yut/7480EOTnNwJhMYWMwKRY+o/TJ5zVsXXAbjVs3ceB5P8F3SOQ6vPFmgmw7gMkWlmDSIJQQdu/ezcR3/8CJY2+lsq5f1C0DNhNkcoUlmDTYtWsXAN26deOll15CggViWtfwBZsJMrnFlQQjIrOAcUADsBH4rqrWxv6p7LRnzx7Gjh2LiFBZWUlBwZfDXq0X5NkhaCbXuNWCWQZMV9UmEbkbmA7c6FIsKVNXV8e4ceNYuXIlc+bM2S+5tK5qN3vSMEssJue4Moukqq+oaugwn1VAPzfiSKW9e/cyfvx4VqxYwWOPPcbkyZNbXotU1S7a9LQx2SwTpqmnAEujvSgiV4rIahFZXVNTk8awOuaHP/why5Yt45FHHuHSSy/d77WOHm9iTLZIWRdJRF4FDorw0i2q+qfge24BmoA50a6jqg8DDwNUVFRkTXml6dOnc/rpp/Ptb3+7zWvJON7EmGyQsgSjqmfEel1ELgfOBkaq5kZdtsbGRp588km+853vMHjwYAYPHhzxfe2tF2NMtnGliyQiZxIY1D1HVevciCHZmpqauPTSS5kyZQorVqyI+d5Y9WKMySVuzSI9AHQClgXXhKxS1e+7FEuHNTc3c/nllzN//nxmzZrF8OHDY77fpqdNvnAlwahq5L5DFvL7/YyacBGVixdQ8o3LeL7peAa/WRU3Wdhyf5MPMmEWKavNnvcKlUuep8dpl9DjqxfalLMxYSzBdNBzm4roO+UBSk69qOU5m3I2JsD2IrWDqjJ16lQqKirYXNsDb++26wRtytkYa8EkTFWZNm0as2fPZu3atVGnlm3K2RhLMAlRVW6++WZ+9atfcfXVVzNr1iybcjYmBusiJWDGjBncddddfO973+PXv/41ImJTzsbEYAnGIVXF7/fzX//1X/zv//5vS00XsClnY6KxBOPA1q1b6d27Nz/72c9Q1f3KLhhjorNPShz33nsvRxxxBB9++CEiYsnFmATYpyWG3/zmN0ydOpXhw4czYMAAt8MxJutYgonid7/7Hddccw0TJkxgzpw5FBZab9KYRFmCieCFF17gBz/4AWeffTbz5s3D6/W6HZIxWckSTAQjR47k9ttv55lnnqGoqMjtcIzJWpZgwrz44ovU1tbi8/mYMWMGnTp1cjskY7KaJZig+fPnM27cOG677Ta3QzEmZ1iCAZ577jkuvvhiTj31VO688063wzEmZ+R9glm8eDGTJk3ipJNOYsmSJXTp0sXtkIzJGXmdYBobG5k6dSrHH388S5cupVu3bm6HZExOyevFHV6vl2XLltG9e3d69OjhdjjG5Jy8bMEsX76ca665Br/fz8CBA+nZs6fbIRmTk/KuBfPaa68xbtw4Bg0axM6dOykpKXE7JGNyVl61YFauXMlZZ53FgAED+POf/2zJxZgUy5sEs2rVKr71rW/Rt29fli9fTp8+fdwOyZiclzcJZseOHQwYMIDly5dz8MEHux2OMXkh5xPMzp07ARgzZgzr16+nX7+2JwAYY1IjpxPM22+/zeDBg5k7dy4AHo8nzk8YY5LJlQQjIj8TkbdEZJ2IvCIifZN9j3/961+MHDkSr9fLSSedlOzLG2MccKsFM0tVj1XVYcALQFJ3GL7//vuMGDECj8dDZWUlhx12WDIvb4xxyJV1MKq6M+xhF0CTde1t27YxYsQIVJXKykq+8pWvJOvSxpgEubbQTkR+AVwG7ACGx3jflcCVgKO6uL169WLq1KmMHj2aI488MknRGmPaQ1ST1njY/8IirwIHRXjpFlX9U9j7pgOdVXVGvGtWVFTo6tWrkxilMaY9RGSNqlbEe1/KWjCqeobDtz4FLAHiJhhjTHZxaxbp8LCH5wDvuRGHMSa13BqDuUtEhgB+4GPg+y7FYYxJIbdmkSa6cV9jTHrl9EpeY4y7LMEYY1LGEowxJmUswRhjUsYSjDEmZVK2kjcVRKSGwLR2PAcAX6Q4HKcyJZZMiQMyJxaLoy2nsQxU1dJ4b8qqBOOUiKx2sow5HTIllkyJAzInFoujrWTHYl0kY0zKWIIxxqRMriaYh90OIEymxJIpcUDmxGJxtJXUWHJyDMYYkxlytQVjjMkAlmCMMSmTswkmHScXOIxjloi8F4zleRFx7bxaEblARDaIiF9E0j4tKiJnisj7IvJvEbkp3fcPi+MPIrJFRN5xK4ZgHP1FpFJE3g3+f7nWpTg6i8gbIrI+GMcdSbu4qubkF9A97PtrgN+5FMdooDD4/d3A3S7+mRwJDAFWABVpvrcH2AgcChQB64GjXPpz+AZwPPCOW/8vgnEcDBwf/L4b8IEbfyaAAF2D33uB14FTknHtnG3BaApPLkgwjldUtSn4cBXg2tGSqvquqr7v0u1PAv6tqh+qagMwDzjXjUBU9TVgmxv3bhXHZ6q6Nvj9LuBdoMyFOFRVdwcfeoNfSfm85GyCgcDJBSLyCXAJST57qZ2mAEvdDsIlZcAnYY8/xYUPU6YSkUOAcgKtBzfu7xGRdcAWYJmqJiWOrE4wIvKqiLwT4etcAFW9RVX7A3OAq92KI/ieW4CmYCwp4yQWl0iE52yNBCAiXYFngetatbzTRlWbNXAQYj/gJBE5JhnXde1cpGTQDDm5IF4cInI5cDYwUoMd3VRJ4M8k3T4F+oc97gdsdimWjCEiXgLJZY6qPud2PKpaKyIrgDOBDg+CZ3ULJpZMOblARM4EbgTOUdU6N2LIEP8EDheRQSJSBEwGFrkck6tERIBHgHdV9V4X4ygNzW6KiA84gyR9XnJ2Ja+IPEtgxqTl5AJVrXIhjn8DnYCtwadWqaorpyiIyATgN0ApUAusU9Uxabz/WcB9BGaU/qCqv0jXvVvFMRc4nUBpgmpghqo+4kIcpwF/Bd4m8PcU4GZVfTHNcRwLPEbg/0sBMF9Vf5qUa+dqgjHGuC9nu0jGGPdZgjHGpIwlGGNMyliCMcakjCUYY0zKZPVCO5N6ItIb+HPw4UFAM1ATfHxScF+RG3GNAOpUdZUb9zfOWIIxManqVmAYgIjcDuxW1V+Gvye4YExU1d/2CikzgsDxGpZgMph1kUy7iMjg4B6n3wFrgf4iUhv2+mQR+X3w+z4i8pyIrA7WHTklwvUKRWR28JpvichVwec/FZHbReTN4PNfEZHDgP8GpgXr/XwtPb+1SZS1YExHHAV8V1W/LyKx/i79GrhHVVcFdw2/ALTeTPcDoC9wnKo2i0ivsNeqVbVcRK4Brg/e7/fAF6p6X9J+G5N0lmBMR2xU1X86eN8ZwJBATwqAniLiU9X6Vu+5T1WbAVQ1vF5LaBPgGuCsDsZs0sgSjOmIPWHf+9m/JEPnsO+F+APCQvTyDfuC/23G/s5mFRuDMUkRHODdLiKHi0gBMCHs5VeBH4YeiMiwCJd4BfiBiHiC7+kV4T3hdhEoM2kymCUYk0w3Ai8RmNb+NOz5HwKnBgdp/wVcEeFnHwI+B94SkfXAhXHu9SfgwuDgrw3yZijbTW2MSRlrwRhjUsYSjDEmZSzBGGNSxhKMMSZlLMEYY1LGEowxJmUswRhjUub/AzIvcvZ+7hFcAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 288x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#预测值与真值的散点图\n",
    "plt.figure(figsize=(4, 4))\n",
    "plt.scatter(y_train, y_train_pred_lr)\n",
    "plt.plot([-3, 3], [-3, 3], '--k')   #数据已经标准化，3倍标准差即可\n",
    "plt.axis('tight')\n",
    "plt.xlabel('True cnt')\n",
    "plt.ylabel('Predicted cnt')\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "基本上预测值和真值属于线性相关的，而且由于剔除了一些离群点，散点图上没有偏离的很离谱的点，说明预测的还行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAaEAAAEkCAYAAACG1Y6pAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAGS9JREFUeJzt3X+01XWd7/HnWzyBiWLqQfFXqKX5+2BHgmsaRaWFqd2pZd5EXaKwTGfVlE1p947k1EyzJFt5dRwoTU1KjXJsaryjlwtDek0HHMYwLJmSG0iAOIpoJj/e94/9PcwRzi/Orw/s/XystdfZ3x/7+31/95b98vP9fvbnG5mJJEkl7Fa6AElS4zKEJEnFGEKSpGIMIUlSMYaQJKkYQ0iSVIwhpEEXEU9FxITSdZQUER+NiN9FxIaIGDOI+90QEUd0suziiHi4n/bzbES8vz+2pfpmCKlfdfTls+2XW2Yel5nzu9nO6IjIiNh9gEotbQZwZWYOz8x/3XZhdeyvVKGxMiJuiIghfd1ptb/f9HU7Un8xhNSQdoJweyvwVDfrnJSZw4H3AOcBlwx4VdIgM4Q06Nq3liJibEQsjIj1EbE6Im6oVltQ/X2xag2Mj4jdIuK/R8TyiFgTEXdGxIh2272wWrYuIv7HNvuZHhFzIuKuiFgPXFzt+9GIeDEiVkXETRHxpnbby4j4VEQ8ExEvR8RfRsSR1WvWR8S97dff5hg7rDUihkbEBmAI8G8R8e/dvV+ZuQx4BGhpt/0REXFrVffKiPhKW0spIt4WEf8cES9FxPMRcc82x/S26vl+EfHj6lgeB45st952LdGImB8Rl1bPj4yI/1O9189HxOyI2KeT96Kzz1gyhFTcN4FvZube1L4E763mn1793ac6hfQocHH1eC9wBDAcuAkgIo4F/hb4JDAKGAEcvM2+zgHmAPsAs4HNwJ8B+wPjgYnAp7Z5zZnAO4FxwJ8Ds6p9HAocD5zfyXF1WGtm/rFq3UCtpXNkxy//TxHxDuA0YFm72XcAm4C3AWOADwKXVsv+EngQeAtwCPA/O9n0zcBr1N6vS9ixllYAfw0cBBxD7f2Y3sm6nX3GkiGkAfH3VevixYh4kVo4dGYj8LaI2D8zN2Tmz7tY95PADZn5m8zcAFwNfKL6v/WPAf+QmQ9n5uvAXwDbDoz4aGb+fWZuycw/ZOaizPx5Zm7KzGeBmdROfbX3N5m5PjOfApYAD1b7fwl4gFoA7GitPfVERLwCLAXmU72PEXEA8CHgM5n5SmauAb4BfKJ63UZqp/sOyszXMnO7zgZVq+lPgL+otrGEWrD1SGYuy8yHqlBdC9zA9u9dmx35jNVgDCENhHMzc5+2B9u3LtqbAhwFPB0R/xIRZ3Wx7kHA8nbTy4HdgQOqZb9rW5CZrwLrtnn979pPRMRREfGTiPh9dYrur6i1itpb3e75HzqYHk7Huqq1p06utn8e8C5gz2r+W4EmYFW7oJ8JjKyW/zm1lsrjUeuJ2FELp7mqp/17sryD9ToUESMj4u7qVOB64C62f+/a7MhnrAZjCKmozHwmM8+n9gX6N8CciNiT7VsxAM9R+wJucxi1U1KrgVXUTj0BEBF7APttu7ttpm8BngbeXp0quobal3d/6KrWHsuae4FHqbXuoBYcfwT2bxf2e2fmcdVrfp+Zl2XmQcA04G/brgO1s7aq59BtamzzSvX3ze3mHdju+V9Tez9PrN67C+jkveviM5YMIZUVERdERHNmbgFerGZvpvYluYXa9ZQ23wf+LCIOj4jh1Fou92TmJmrXej4SEf+l6izwZboPlL2A9cCG6rrL5f12YF3X2htfA6ZGxIGZuYraNZ+vR8TeVSeIIyPiPQAR8fGIaAvk/6AWFpvbbywzNwM/AqZHxJura2oXtVu+FlgJXBARQ6rWVPvrV3sBG6h1HDkY+HxnhXfxGUuGkIo7E3iq6jH2TeAT1XWMV4GvAo9Up5zGAbcB36XWc+631C6q/ylAdc3mT4G7qbWKXgbWUGsxdOYq4L9V634LuKeLdXdUp7X2Rmb+Avhn/vPL/kLgTcAvqQXNHGodDABOAR6r3tMfA5/OzN92sNkrqZ3u+z1wO/CdbZZfVu1vHXAc8H/bLfsytdOFLwE/pRZonenwM+76iNUowpvaqR5VrY8XqZ1q6+gLWNJOwJaQ6kZEfKQ6tbQntREJfgE8W7YqSV0xhFRPzqHWIeA54O3UTvvY1Jd2Yp6OkyQVY0tIklTMoA7iuP/+++fo0aMHc5eSpAIWLVr0fGY2d7feoIbQ6NGjWbhw4WDuUpJUQET0aAQOT8dJkooxhCRJxRhCkqRiSt9dUlKD2LhxIytWrOC11xyxp54MGzaMQw45hKampl693hCSNChWrFjBXnvtxejRo4nor8HKVVJmsm7dOlasWMHhhx/eq214Ok7SoHjttdfYb7/9DKA6EhHst99+fWrdGkKSBo0BVH/6+pkaQpKkYrwmJKmIadP6d3szZ3a/zpAhQzjhhBPYtGkThx9+ON/97nfZZ599dnhfl156KZ/97Gc59thj3zD/9ttvZ+HChdx00007vE2A4cOHs2HDhh6tO2HCBGbMmEFra+vWeQsXLuTOO+/kxhtv7NX+S7AlJKlh7LHHHixevJglS5aw7777cvPNN/dqO9/+9re3C6CdQWtr64AH0ObN/XtTXENI0g6ZNq3rx65i/PjxrFy5cuv09ddfzymnnMKJJ57ItddeC8Arr7zCpEmTOOmkkzj++OO5557azXcnTJiwdQiy73znOxx11FG85z3v4ZFHHtm6vYsvvpg5c+ZsnR4+fDgAGzZsYOLEiZx88smccMIJ3H///dvVtmrVKk4//XRaWlo4/vjj+dnPftajY5o/fz5nnXUWANOnT+eSSy5hwoQJHHHEEW8Ip7vuuouxY8fS0tLCtGnTtgbL5ZdfTmtrK8cdd9zW9wBqQ65dd911vPvd7+YHP/hBj2rpKU/HSWo4mzdvZu7cuUyZMgWABx98kGeeeYbHH3+czOTss89mwYIFrF27loMOOoif/vSnALz00ktv2M6qVau49tprWbRoESNGjOC9730vY8aM6XLfw4YN47777mPvvffm+eefZ9y4cZx99tlvuMD/ve99jzPOOIMvfelLbN68mVdffbVXx/n0008zb948Xn75ZY4++mguv/xyli1bxj333MMjjzxCU1MTn/rUp5g9ezYXXnghX/3qV9l3333ZvHkzEydO5Mknn+TEE0/cWvfDDz/cqzq6YghJahh/+MMfaGlp4dlnn+Wd73wnH/jAB4BaCD344INbA2TDhg0888wznHbaaVx11VV84Qtf4KyzzuK00057w/Yee+wxJkyYQHNzbbDo8847j1//+tdd1pCZXHPNNSxYsIDddtuNlStXsnr1ag488MCt65xyyilccsklbNy4kXPPPZeWlpZeHe+kSZMYOnQoQ4cOZeTIkaxevZq5c+eyaNEiTjnllK3vyciRIwG49957mTVrFps2bWLVqlX88pe/3BpC5513Xq9q6I6n4yQ1jLZrQsuXL+f111/fek0oM7n66qtZvHgxixcvZtmyZUyZMoWjjjqKRYsWccIJJ3D11Vdz3XXXbbfNzroo77777mzZsmXr9l9//XUAZs+ezdq1a1m0aBGLFy/mgAMO2O53NqeffjoLFizg4IMPZvLkydx55529Ot6hQ4dufT5kyBA2bdpEZnLRRRdtPdZf/epXTJ8+nd/+9rfMmDGDuXPn8uSTTzJp0qQ31LXnnnv2qobuGEKSGs6IESO48cYbmTFjBhs3buSMM87gtttu29ozbeXKlaxZs4bnnnuON7/5zVxwwQVcddVVPPHEE2/Yzrve9S7mz5/PunXr2Lhx4xuul4wePZpFixYBcP/997Nx40agdkpv5MiRNDU1MW/ePJYv3/6OB8uXL2fkyJFcdtllTJkyZbv99sXEiROZM2cOa9asAeCFF15g+fLlrF+/nj333JMRI0awevVqHnjggX7bZ1c8HSepiJ50qR5IY8aM4aSTTuLuu+9m8uTJLF26lPHjxwO1TgR33XUXy5Yt4/Of/zy77bYbTU1N3HLLLW/YxqhRo5g+fTrjx49n1KhRnHzyyVsv8l922WWcc845jB07lokTJ25tSXzyk5/kIx/5CK2trbS0tPCOd7xju9rmz5/P9ddfT1NTE8OHD++0JTRp0qStY7aNHz+eK664otvjPvbYY/nKV77CBz/4QbZs2UJTUxM333wz48aNY8yYMRx33HEcccQRnHrqqT1/M/sgMnNQdgTQ2tqa3tRO2rV11wOus3BZunQpxxxzTP8XpOI6+mwjYlFmtnbykq08HSdJKsYQkiQVYwhJGjSDefpfg6Ovn6khJGlQDBs2jHXr1hlEdaTtfkLDhg3r9Ta67R0XEcOABcDQav05mXltRBwO3A3sCzwBTM7M13tdiaS6dsghh7BixQrWrl1buhT1o7Y7q/ZWT7po/xF4X2ZuiIgm4OGIeAD4LPCNzLw7Iv4OmALc0tWGJDWupqamXt99U/Wr29NxWdM2tnhT9UjgfUDb6Hx3AOcOSIWSpLrVo2tCETEkIhYDa4CHgH8HXszMTdUqK4CDB6ZESVK96lEIZebmzGwBDgHGAh394qzDq40RMTUiFkbEQs8FS5La26HecZn5IjAfGAfsExFt15QOAZ7r5DWzMrM1M1vbRpqVJAl6EEIR0RwR+1TP9wDeDywF5gEfq1a7CNj+zkySJHWhJ73jRgF3RMQQaqF1b2b+JCJ+CdwdEV8B/hW4dQDrlCTVoW5DKDOfBLa7VWBm/oba9SFJknrFERMkScUYQpKkYgwhSVIxhpAkqRhDSJJUjCEkSSrGEJIkFWMISZKKMYQkScX0ZNgeSeqxadO6X2fmzIGvQ7sGW0KSpGIMIUlSMYaQJKkYQ0iSVIwhJEkqxhCSJBVjCEmSivF3QlJh/q5GjcyWkCSpGENIklSMISRJKsYQkiQVYwhJkooxhCRJxdhFW2oQdgXXzsiWkCSpGENIklSMISRJKqbbEIqIQyNiXkQsjYinIuLT1fzpEbEyIhZXjw8PfLmSpHrSk44Jm4DPZeYTEbEXsCgiHqqWfSMzZwxceZKketZtCGXmKmBV9fzliFgKHDzQhUmS6t8OddGOiNHAGOAx4FTgyoi4EFhIrbX0Hx28ZiowFeCwww7rY7mSOtOTLtjSzqbHHRMiYjjwQ+AzmbkeuAU4Emih1lL6ekevy8xZmdmama3Nzc39ULIkqV70KIQioolaAM3OzB8BZObqzNycmVuAbwFjB65MSVI96knvuABuBZZm5g3t5o9qt9pHgSX9X54kqZ715JrQqcBk4BcRsbiadw1wfkS0AAk8C3hGWpK0Q3rSO+5hIDpY9I/9X44kqZE4YoIkqRhDSJJUjLdykHYB/gZI9cqWkCSpGENIklSMISRJKsYQkiQVYwhJkooxhCRJxdhFWxpgdq+WOmdLSJJUjCEkSSrGEJIkFWMISZKKMYQkScUYQpKkYuyiLaluddc9fubMwalDnbMlJEkqxhCSJBVjCEmSijGEJEnFGEKSpGIMIUlSMXbRlrTVYI34bddptbElJEkqxhCSJBVjCEmSijGEJEnFdBtCEXFoRMyLiKUR8VREfLqav29EPBQRz1R/3zLw5UqS6klPWkKbgM9l5jHAOOCKiDgW+CIwNzPfDsytpiVJ6rFuQygzV2XmE9Xzl4GlwMHAOcAd1Wp3AOcOVJGSpPq0Q9eEImI0MAZ4DDggM1dBLaiAkZ28ZmpELIyIhWvXru1btZKkutLjEIqI4cAPgc9k5vqevi4zZ2Vma2a2Njc396ZGSVKd6lEIRUQTtQCanZk/qmavjohR1fJRwJqBKVGSVK960jsugFuBpZl5Q7tFPwYuqp5fBNzf/+VJkupZT8aOOxWYDPwiIhZX864BvgbcGxFTgP8HfHxgSpQk1atuQygzHwaik8UT+7ccSVIjccQESVIxhpAkqRhDSJJUjCEkSSrGEJIkFWMISZKKMYQkScUYQpKkYnoyYoIk7XSmTStdgfqDLSFJUjGGkCSpGENIklSMISRJKsYQkiQVYwhJkooxhCRJxfg7IUk7HX8D1DhsCUmSijGEJEnFGEKSpGIMIUlSMYaQJKkYQ0iSVIwhJEkqxhCSJBVjCEmSijGEJEnFdBtCEXFbRKyJiCXt5k2PiJURsbh6fHhgy5Qk1aOetIRuB87sYP43MrOlevxj/5YlSWoE3YZQZi4AXhiEWiRJDaYv14SujIgnq9N1b+lspYiYGhELI2Lh2rVr+7A7SVK96W0I3QIcCbQAq4Cvd7ZiZs7KzNbMbG1ubu7l7iRJ9ahXIZSZqzNzc2ZuAb4FjO3fsiRJjaBXIRQRo9pNfhRY0tm6kiR1pts7q0bE94EJwP4RsQK4FpgQES1AAs8C3gdRkrTDug2hzDy/g9m3DkAtkqQG44gJkqRiDCFJUjGGkCSpGENIklSMISRJKsYQkiQVYwhJkooxhCRJxRhCkqRiDCFJUjGGkCSpGENIklSMISRJKsYQkiQVYwhJkooxhCRJxXR7UztJnZvmPYWlPrElJEkqxhCSJBVjCEmSijGEJEnFGEKSpGIMIUlSMYaQJKkYQ0iSVIwhJEkqxhCSJBVjCEmSiuk2hCLitohYExFL2s3bNyIeiohnqr9vGdgyJUn1qCctoduBM7eZ90Vgbma+HZhbTUuStEO6DaHMXAC8sM3sc4A7qud3AOf2c12SpAbQ22tCB2TmKoDq78jOVoyIqRGxMCIWrl27tpe7kyTVowHvmJCZszKzNTNbm5ubB3p3kqRdSG9DaHVEjAKo/q7pv5IkSY2ityH0Y+Ci6vlFwP39U44kqZH0pIv294FHgaMjYkVETAG+BnwgIp4BPlBNS5K0Q3bvboXMPL+TRRP7uRZJUoNxxARJUjHdtoSkXdG0ad2vM3PmwNchqWu2hCRJxRhCkqRiDCFJUjGGkCSpGENIklSMISRJKsYQkiQVYwhJkooxhCRJxRhCkqRiDCFJUjGGkCSpGENIklSMISRJKsZbOahh9eR2D5IGli0hSVIxhpAkqRhDSJJUjCEkSSrGEJIkFWMISZKKsYu2JHWhJ135Z84c+DrqlS0hSVIxhpAkqRhDSJJUTJ+uCUXEs8DLwGZgU2a29kdRkqTG0B8dE96bmc/3w3YkSQ3G03GSpGL62hJK4MGISGBmZs7adoWImApMBTjssMP6uDtJ6j/9NZJ6d9uxC3fn+toSOjUzTwY+BFwREadvu0JmzsrM1sxsbW5u7uPuJEn1pE8hlJnPVX/XAPcBY/ujKElSY+h1CEXEnhGxV9tz4IPAkv4qTJJU//pyTegA4L6IaNvO9zLzf/VLVZKkhtDrEMrM3wAn9WMtkqQGYxdtSVIxhpAkqRhv5aB+1V+/u+iOv7uQ6oMtIUlSMYaQJKkYQ0iSVIwhJEkqxhCSJBVjCEmSirGLdh3oj27Ru1qX58HqCi5pYNkSkiQVYwhJkooxhCRJxRhCkqRiDCFJUjGGkCSpGLtoq8fsFi2pv9kSkiQVYwhJkooxhCRJxRhCkqRiDCFJUjGGkCSpmF2ui3ZPugnvLCNC11utknpnV/r3NdjfSbaEJEnFGEKSpGIMIUlSMYaQJKmYPoVQRJwZEb+KiGUR8cX+KkqS1Bh6HUIRMQS4GfgQcCxwfkQc21+FSZLqX19aQmOBZZn5m8x8HbgbOKd/ypIkNYLIzN69MOJjwJmZeWk1PRl4V2Zeuc16U4Gp1eTRwK+62Oz+wPO9KmjX16jH7nE3nkY99kY77rdmZnN3K/Xlx6rRwbztEi0zZwGzerTBiIWZ2dqHmnZZjXrsHnfjadRjb9Tj7k5fTsetAA5tN30I8FzfypEkNZK+hNC/AG+PiMMj4k3AJ4Af909ZkqRG0OvTcZm5KSKuBP4JGALclplP9bGeHp22q1ONeuwed+Np1GNv1OPuUq87JkiS1FeOmCBJKsYQkiQVs9OFUERcHxFPR8STEXFfROxTuqbBEBEfj4inImJLRDREN85GHPYpIm6LiDURsaR0LYMpIg6NiHkRsbT67/zTpWsaLBExLCIej4h/q479y6Vr2pnsdCEEPAQcn5knAr8Gri5cz2BZAvxXYEHpQgZDAw/7dDtwZukiCtgEfC4zjwHGAVc0yOcN8EfgfZl5EtACnBkR4wrXtNPY6UIoMx/MzE3V5M+p/f6o7mXm0szsajSJetOQwz5l5gLghdJ1DLbMXJWZT1TPXwaWAgeXrWpwZM2GarKpetgjrLLThdA2LgEeKF2EBsTBwO/aTa+gQb6UGl1EjAbGAI+VrWTwRMSQiFgMrAEeysyGOfbu9GXYnl6LiP8NHNjBoi9l5v3VOl+i1oSfPZi1DaSeHHcD6dGwT6ovETEc+CHwmcxcX7qewZKZm4GW6hr3fRFxfGY21HXBzhQJocx8f1fLI+Ii4CxgYtbRD5m6O+4G47BPDSYimqgF0OzM/FHpekrIzBcjYj6164KGEDvh6biIOBP4AnB2Zr5auh4NGId9aiAREcCtwNLMvKF0PYMpIprbevlGxB7A+4Gny1a189jpQgi4CdgLeCgiFkfE35UuaDBExEcjYgUwHvhpRPxT6ZoGUtX5pG3Yp6XAvf0w7NNOLyK+DzwKHB0RKyJiSumaBsmpwGTgfdW/68UR8eHSRQ2SUcC8iHiS2v98PZSZPylc007DYXskScXsjC0hSVKDMIQkScUYQpKkYgwhSVIxhpAkqRhDSJJUjCEkSSrm/wPw6VlSakXHFQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#在测试集上观察预测残差的分布，看是否符合模型假设：噪声为0均值的高斯噪声\n",
    "f, ax = plt.subplots(figsize=(6, 4)) \n",
    "f.tight_layout() \n",
    "ax.hist(y_test - y_test_pred_lr,bins=40, label='Residuals Linear', color='b', alpha=.6); \n",
    "ax.set_title(\"Histogram of Residuals\") \n",
    "ax.legend(loc='best');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "进一步画出测试集的预测值和真值的残差分布，发现基本符合高斯分布的形状，但上面的r2_score的评分是负数，所以验证了猜想：训练集和测试集的特征是高度相似的，训练集训练出来的模型在测试集上也能使残差成正态分布，但是由于测试集的真值整体比训练集有明显的增大，造成了测试集的预测值与真值的差异"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\utils\\validation.py:578: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  y = column_or_1d(y, warn=True)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "array([ 0.29344215,  0.00914217, -0.05366267,  0.03006426,  0.01353414,\n",
       "       -0.19898028,  0.64422447, -0.09221089, -0.09760748])"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 线性模型，随机梯度下降优化模型参数\n",
    "from sklearn.linear_model import SGDRegressor\n",
    "\n",
    "# 使用默认配置初始化线\n",
    "sgdr = SGDRegressor(max_iter=1000)\n",
    "\n",
    "# 训练：参数估计\n",
    "sgdr.fit(X_train, y_train)\n",
    "\n",
    "# 预测\n",
    "#sgdr_y_predict = sgdr.predict(X_test)\n",
    "\n",
    "sgdr.coef_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "随机梯度下降法适用于大型训练集，因为在做反向传播的时候要计算每个维度的梯度，如果样本数非常多，每次迭代的花费的时间很长，整体训练时间和计算量都会加大，使用SGD一方面可以降低计算量，更快的收敛，另一方面也可以增加随机性，更可能跳出局部最优解，找到全局最优解。本题的数据量非常小，使用随机梯度下降法可能训练效果反而变差"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The value of default measurement of SGDRegressor on test is -0.7059382463912496\n",
      "The value of default measurement of SGDRegressor on train is 0.759852002000119\n"
     ]
    }
   ],
   "source": [
    "# 使用SGDRegressor模型自带的评估模块(评价准则为r2_score)，并输出评估结果\n",
    "print ('The value of default measurement of SGDRegressor on test is', sgdr.score(X_test, y_test))\n",
    "print ('The value of default measurement of SGDRegressor on train is', sgdr.score(X_train, y_train))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.2正则化的线性回归（L2正则 --> 岭回归）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The r2 score of RidgeCV on test is -0.717885446567879\n",
      "The r2 score of RidgeCV on train is 0.7594162834790815\n"
     ]
    }
   ],
   "source": [
    "#岭回归／L2正则\n",
    "#class sklearn.linear_model.RidgeCV(alphas=(0.1, 1.0, 10.0), fit_intercept=True, \n",
    "#                                  normalize=False, scoring=None, cv=None, gcv_mode=None, \n",
    "#                                  store_cv_values=False)\n",
    "from sklearn.linear_model import  RidgeCV\n",
    "\n",
    "#设置超参数（正则参数）范围\n",
    "alphas = [ 0.01, 0.1, 1, 10,100]\n",
    "#n_alphas = 20\n",
    "#alphas = np.logspace(-5,2,n_alphas)\n",
    "\n",
    "#生成一个RidgeCV实例\n",
    "ridge = RidgeCV(alphas=alphas, store_cv_values=True)  \n",
    "\n",
    "#模型训练\n",
    "ridge.fit(X_train, y_train)    \n",
    "\n",
    "#预测\n",
    "y_test_pred_ridge = ridge.predict(X_test)\n",
    "y_train_pred_ridge = ridge.predict(X_train)\n",
    "\n",
    "\n",
    "# 评估，使用r2_score评价模型在测试集和训练集上的性能\n",
    "print ('The r2 score of RidgeCV on test is', r2_score(y_test, y_test_pred_ridge))\n",
    "print ('The r2 score of RidgeCV on train is', r2_score(y_train, y_train_pred_ridge))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "RodgeCV中已经封装了交叉验证，默认参数使用mse来评价交叉验证，使用留一交叉验证的方式，由于要打印出结果，所以参数中选择了保留结果。alpha是用来约束模型复杂度的一个超参数，训练时一般在log域中均匀取值，此处取五个值。最后得出的r2_score在训练集上的分数比上方没有正则项的线性回归小0.00044左右，说明第一个训练方法有轻微的过拟合"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZIAAAEKCAYAAAA4t9PUAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAH61JREFUeJzt3X2QVfWd5/H3t7sBFVAEGkQeBBJESXi0YUyMJiJJ+ZAgPgBxk12TddbKbmVrZq3MxJRbpjZbU1uR3UnV7GR2kkwySWpSSXejKCpGDaJmsov25RlEFPHhXmjoRpBHaeju7/5xztXr5dzmtrfPffy8qrq4fc7vnPM9B+hPn3Pv+R5zd0RERD6uulIXICIilU1BIiIiBVGQiIhIQRQkIiJSEAWJiIgUREEiIiIFUZCIiEhBFCQiIlIQBYmIiBSkodQFFMPo0aN98uTJpS5DRKSibNiw4aC7N55rXE0EyeTJk0kkEqUuQ0SkopjZ2/mM06UtEREpiIJEREQKoiAREZGCKEhERKQgChIRESmIgkRERAqiIBERkYIoSEREqlDq8El++PtX6Th2KvZtKUhERKpQayLFP77wBmd6PPZtKUhERKpMb6+zckOKz31yNONHnB/79hQkIiJV5k9vHGTve++zrGliUbanIBERqTItiRQjLhjElz41tijbU5CIiFSR906e5ukd+1kyZzxDGuqLsk0FiYhIFXls8z5Od/cW7bIWKEhERKpKc1uST4+/kBmXXli0bSpIRESqxPa9R3il/SjLi3g2AgoSEZGq0ZJIMrihjsWzxxd1uwoSEZEqcOpMD49u2stNn76Eiy4YVNRtK0hERKrA0zv2c/RUd1HfZE9TkIiIVIGWRJIJF5/PZ6aOKvq2FSQiIhUueegkf9r9LkuvmkhdnRV9+woSEZEK17ohhRnc2TShJNtXkIiIVLCeXmdlIsm10xqL0qAxioJERKSC/Wn3QfYdOcWyEp2NgIJERKSitSSSjLhgEF+cUZwGjVEUJCIiFerwidM8s+NAURs0RlGQiIhUqMc27+V0T3EbNEZRkIiIVCB3pzmRYub4i4raoDGKgkREpAJt33uUne1HWTa/tGcjoCAREalILYkkQxrqWDz70lKXoiAREak0p8708OjmsEHj+cVt0Bgl1iAxsxvNbJeZ7Taz+yPm32dmr5jZVjNba2aXhdOvN7PNGV+nzGxJOO+XZvZmxrw5ce6DiEi5eXrHfo6VqEFjlIa4Vmxm9cCPgS8CKaDNzFa7+ysZwzYBTe5+0sz+I/AQsNzd1wFzwvWMBHYDz2Qs91fuvjKu2kVEyllzW5KJI8/n6hI0aIwS5xnJAmC3u+9x99PA74BbMwe4+zp3Pxl+ux6IujXzTuCpjHEiIjUreegk//eN0jVojBJnkIwHkhnfp8JpudwDPBUx/avAb7Om/U14OexHZjaksDJFRCpHayIZNGi8qnQtUbLFGSRRUemRA82+DjQBK7KmjwNmAk9nTP4ecAUwHxgJfDfHOu81s4SZJTo7O/tfvYhImenpdVZuSHHdtEYuLVGDxihxBkkKyHwnaAKwL3uQmS0CHgAWu3tX1uxlwCp3P5Oe4O7tHugC/pngEtpZ3P2n7t7k7k2NjY0F7oqISOn96wcNGsvjTfa0OIOkDZhmZlPMbDDBJarVmQPMbC7wE4IQ6YhYx11kXdYKz1IwMwOWANtjqF1EpOy0JJJcfMEgFs0YU+pSPiK2T225e7eZfZvgslQ98At332FmPwAS7r6a4FLWMKA1yAXecffFAGY2meCM5oWsVf/GzBoJLp1tBr4V1z6IiJSLwydO8+yOA3zt6kklbdAYJbYgAXD3NcCarGkPZrxe1MeybxHx5ry7LxzAEkVEKsKqTUGDxuVl0BIlm+5sFxEpc+5OSyLJrAkXccUlpW3QGEVBIiJS5rbtPcKr+4+V3ZvsaQoSEZEyl27Q+JUyaNAYRUEiIlLGTp3p4bHN+7h55riyaNAYRUEiIlLGfr89aNC4tKl87mTPpiARESljzW1JJo28gKunlEeDxigKEhGRMvXOuyf5f3veZVnThLJp0BhFQSIiUqZaNySpM7ijjBo0RlGQiIiUoQ8aNF7eyLiLyqdBYxQFiYhIGfrj6520l2GDxigKEhGRMtSSSDJy6GAWXTm21KWck4JERKTMHDpxmmdfOcCSOeMZ3FD+P6bLv0IRkRqzatNezvR4WTZojKIgEREpI+5OayLJ7AkXMf2S4aUuJy8KEhGRMrI1FTZorJCzEVCQiIiUlZZEkvMGlW+DxigKEhGRMvH+6R5Wb97HzZ8ex4XnlWeDxigKEhGRMvH7He0c6+pmaQXcO5JJQSIiUiaa25JcNuoCrp46stSl9IuCRESkDLz97gnW7znEsqaJmJVvg8YoChIRkTLQmkgFDRrnlXeDxigKEhGREks3aPz85Y1cctF5pS6n3xQkIiIl9uLrnew/WhkNGqMoSERESqylLWjQeEMFNGiMoiARESmhd4938YedB7htbmU0aIxSmVWLiFSJdIPGSr2sBQoSEZGScXdaEklmTxxRMQ0aoyhIRERKZEvqCK8dOM7yCj4bAQWJiEjJpBs0fnn2uFKXUhAFiYhICbx/uofHN+/j5pmV1aAxioJERKQEntoeNGis5DfZ0xQkIiIl0NyWZPKoC/izKZXVoDGKgkREpMjeOniCl948xNIKbNAYRUEiIlJkrRuSFdugMYqCRESkiLp7elm5IcUXpo+pyAaNURQkIiJF9MfXD3LgaBfLmqrjbAQUJCIiRdXclmTU0MEsvKIyGzRGiTVIzOxGM9tlZrvN7P6I+feZ2StmttXM1prZZeH0681sc8bXKTNbEs6bYmYvmdnrZtZsZoPj3AcRkYFSDQ0ao8S2J2ZWD/wYuAmYAdxlZjOyhm0Cmtx9FrASeAjA3de5+xx3nwMsBE4Cz4TL/BD4kbtPAw4D98S1DyIiA2nVpr109zrL5lf+vSOZ4ozEBcBud9/j7qeB3wG3Zg4IA+Nk+O16IOqi4Z3AU+5+0oLPyS0kCB2AXwFLYqleRGQAuTvNbUnmTBzB5WMrt0FjlDiDZDyQzPg+FU7L5R7gqYjpXwV+G74eBbzn7t15rlNEpCxsTr7H6x3HWV5lZyMADTGuO+ouG48caPZ1oAn4fNb0ccBM4OmPsc57gXsBJk2alF/FIiIxaUmkOH9QPV+eVdkNGqPEeUaSAjKjdwKwL3uQmS0CHgAWu3tX1uxlwCp3PxN+fxAYYWbpAIxcJ4C7/9Tdm9y9qbGxsYDdEBEpzMnT3Ty+JWjQOLzCGzRGiTNI2oBp4aesBhNcolqdOcDM5gI/IQiRjoh13MWHl7VwdwfWEbxvAnA38FgMtYuIDJintu3neFd3VV7WghiDJHwf49sEl6V2Ai3uvsPMfmBmi8NhK4BhQGv4Md8PgsbMJhOc0byQtervAveZ2W6C90x+Htc+iIgMhOZEkimjhzJ/8sWlLiUWcb5HgruvAdZkTXsw4/WiPpZ9i4g30t19D8EnwkREyt6bB0/w8puH+Osbp1dFg8Yo1XNHjIhIGWpNVFeDxigKEhGRmKQbNF4/fQxjL6yOBo1RFCQiIjF58fVOOo51sbQKnoLYFwWJiEhMmtuSjB42mBuuHFPqUmKlIBERicHB412s3dnBbXPHM6i+un/UVvfeiYiUyKqNYYPGKr+sBQoSEZEB5+60JJLMnTSCaVXWoDGKgkREZIBtSjdorIGzEVCQiIgMuNZEkvMH1XNLFTZojKIgEREZQEGDxnZumVWdDRqjKEhERAbQk1vbq7pBYxQFiYjIAGpNpJg6eihNl1Vng8YoChIRkQGyp/M4L791iKVNE6u2QWOUvIPEzD5nZt8MXzea2ZT4yhIRqTytG1LU1xl3zKutJ4DnFSRm9n2C54B8L5w0CPiXuIoSEak03T29PLwhxfXTGxlTxQ0ao+R7RnIbsBg4AeDu+4Dqv8tGRCRPL7xWGw0ao+QbJKfDx9w6gJkNja8kEZHKk27QuPCK6m7QGCXfIGkxs58AI8zsPwB/AH4WX1kiIpWj81gXz73awe3zJlR9g8YoeT1q193/p5l9ETgKTAcedPdnY61MRKRCrNqUChs0Vu9TEPuSV5CEl7Kec/dnzWw6MN3MBrn7mXjLExEpb0GDxhTzJo3gk2Nq863jfM/BXgSGmNl4gsta3wR+GVdRIiKVYuM777G743hN3cmeLd8gMXc/CdwO/G93vw2YEV9ZIiKVoaUtyQWD67ll1qWlLqVk8g4SM/sM8DXgyXBaXpfFRESq1Ymubp7Yuo9bZo5j2JDa/ZGYb5D8BXA/8Ii77wjvan8uvrJERMrfk9vaOXG6p6Yva0H+ZxUngV7gLjP7OmCE95SIiNSq1kSSqY1DuaqGGjRGyTdIfgN8B9hOECgiIjXtjc7jtL11mPtvuqKmGjRGyTdIOt398VgrERGpIK2JoEHj7TXWoDFKvkHyfTP7J2At0JWe6O6PxFKViEgZ6+7p5eGNKa6fPoYxw2urQWOUfIPkm8AVBF1/05e2HFCQiEjNeX5XJ53Humr2TvZs+QbJbHefGWslIiIVojmRZPSwIVxfgw0ao+T78d/1ZqYbEEWk5nUcO8Vzr3Zwx7zxNdmgMUq+ZySfA+42szcJ3iMxwN19VmyViYiUoVUb99LT6zX53JFc8g2SG2OtQkSkArg7zYkkV112MZ8cM6zU5ZSNfNvIvx13ISIi5W7jO4fZ03mCh+74RKlLKSu6wCcikqfmDxo0jit1KWVFQSIikoegQWM7X541jqE13KAxioJERCQPT25t56QaNEaKNUjM7EYz22Vmu83s/oj595nZK2a21czWmtllGfMmmdkzZrYzHDM5nP5LM3vTzDaHX3Pi3AcREYCWsEHjvEm13aAxSmxBYmb1wI+BmwgegnVXxL0om4Cm8GPEK4GHMub9Gljh7lcCC4COjHl/5e5zwq/Nce2DiAjA7o7jJN4+zPKmiTXfoDFKnGckC4Dd7r7H3U8DvwNuzRzg7uvCJy8CrAcmAISB0+Duz4bjjmeMExEpqtYNSerrjNvUoDFSnEEyHkhmfJ8Kp+VyD/BU+Ppy4D0ze8TMNpnZivAMJ+1vwsthPzKzIVErM7N7zSxhZonOzs5C9kNEatiZnl4e3rCXhVeoQWMucQZJ1Plf5MOwwodlNQErwkkNwLUEz0CZD0wFvhHO+x5BA8n5wEjgu1HrdPefunuTuzc1NjZ+zF0QkVr3/K5ODh7vYpnuZM8pziBJAZlHfgKwL3uQmS0CHgAWu3tXxrKbwsti3cCjwDwAd2/3QBfwzwSX0EREYtHclqRx+BCun65fSHOJM0jagGlmNsXMBgNfBVZnDjCzucBPCEKkI2vZi80s/Te3EHglXGZc+KcBSwie2igiMuA6jp5i3a4Obp83ngY1aMwptrtq3L3bzL4NPA3UA79w9x1m9gMg4e6rCS5lDQNaw09CvOPui929x8y+A6wNA2MD8LNw1b8JA8aAzcC34toHEaltj2wKGjTqslbfYr09093XAGuypj2Y8XpRH8s+C5zVXdjdFw5kjSIiUdydlrYkTZddzCca1aCxLzpXExGJsOHtw+w5eIJlupP9nBQkIiIRmtuSDB1czy0z1aDxXBQkIiJZjnd18+S2dr4861I1aMyDgkREJMuTW/dx8nSPLmvlSUEiIpKlJZHiE41DmTdpRKlLqQgKEhGRDLs7jrHh7cMsn68GjflSkIiIZGhNpGioM26bO6HUpVQMBYmISOhMTy8Pb0yx8IoxNA6P7AcrERQkIiKhda92cPD4ad3J3k8KEhGRUEsiaND4BTVo7BcFiYgI6QaNndwxb4IaNPaTjpaICPDwxnSDRr3J3l8KEhGpee5OayLJgskjmaoGjf2mIBGRmpcIGzQu1dnIx6IgEZGa90GDxllq0PhxKEhEpKYd7+rmya3tfGX2pVwwWA0aPw4FiYjUtCe27OP9M2rQWAgFiYjUtJZEkk+OGcbciWrQ+HEpSESkZu3uOMbGd95jeZMaNBZCQSIiNasl3aBx3vhSl1LRFCQiUpPO9PTyyMYUN1w5htHD1KCxEAoSEalJa3eqQeNAUZCISE1qTSQZM3wIn79cDRoLpSARkZpz4Ogp1u3q4I6r1KBxIOgIikjNeXhjil5Hl7UGiIJERGpK0KAxxYIpI5kyemipy6kKChIRqSltbx3mzYMndDYygBQkIlJTmtuSDBvSwM0zLyl1KVVDQSIiNePYqTOs2dbOV2aPU4PGAaQgEZGa8cTW9qBBoy5rDSgFiYjUjJZEkmljhjFHDRoHlIJERGrC6weOsemd91g+Xw0aB5qCRERqQnNbkoY6Y8lcNWgcaAoSEal6p7t7WbVpL4uuHKsGjTFQkIhI1Xvu1QO8e+I0y+ZPKHUpVUlBIiJVryWRYuyFQ7humho0xiHWIDGzG81sl5ntNrP7I+bfZ2avmNlWM1trZpdlzJtkZs+Y2c5wzORw+hQze8nMXjezZjMbHOc+iEhl23/kFM/v6uCOeWrQGJfYjqqZ1QM/Bm4CZgB3mdmMrGGbgCZ3nwWsBB7KmPdrYIW7XwksADrC6T8EfuTu04DDwD1x7YOIVD41aIxfnPG8ANjt7nvc/TTwO+DWzAHuvs7dT4bfrgcmAISB0+Duz4bjjrv7SQs+s7eQIHQAfgUsiXEfRKSCBQ0ak/zZlJFMVoPG2MQZJOOBZMb3qXBaLvcAT4WvLwfeM7NHzGyTma0Iz3BGAe+5e3ee6xSRGvbym4d4692TOhuJWZxBEnXHj0cONPs60ASsCCc1ANcC3wHmA1OBb/RznfeaWcLMEp2dnf2rXESqQnMi3aBxXKlLqWpxBkkKyPw1YAKwL3uQmS0CHgAWu3tXxrKbwsti3cCjwDzgIDDCzBr6WieAu//U3ZvcvamxUZ/UEKk1HzZovJTzB9eXupyqFmeQtAHTwk9ZDQa+CqzOHGBmc4GfEIRIR9ayF5tZOgEWAq+4uwPrgDvD6XcDj8W4DyJSoR7f0s6pM70sn6/LWnGLLUjCM4lvA08DO4EWd99hZj8ws8XhsBXAMKDVzDab2epw2R6Cy1przWwbwSWtn4XLfBe4z8x2E7xn8vO49kFEKldzIsnlY4cxe8JFpS6l6sXakN/d1wBrsqY9mPF6UR/LPgvMipi+h+ATYSIikXbtP8aW5Hv811uuVIPGItDdOSJSdVoSSQbVG7epQWNRKEhEpKpkNmgcpQaNRaEgEZGqsnbnAQ6dOK17R4pIQSIiVaUlkeSSC8/jusv1sf9iUZCISNXYf+QUL7zWyZ1XTaC+Tm+yF4uCRESqRrpB49ImPXekmBQkIlIVenudlkSSq6eO5LJRatBYTAoSEakKL791iLfVoLEkFCQiUhVa2pIMH9LATZ9Wg8ZiU5CISMU7euoMa7a385U5atBYCgoSEal4j2/ZFzRo1GWtklCQiEjFa2lLMn3scGapQWNJKEhEpKK9uv8oW1JHWDZ/oho0loiCREQqWktbSg0aS0xBIiIVK2jQmOKLM8YycujgUpdTsxQkIlKx/rDzAIdPnmGp3mQvKQWJiFSslkSScRedx3XT1KCxlBQkIlKR2o+8z4tq0FgWFCQiUpEe3hA2aLxKl7VKTUEiIhUnaNCY4jNTRzFp1AWlLqfmKUhEpOKsf/Nd3jl0kmXz1S6+HChIRKTitCZSDD9PDRrLhYJERCrKkffPsGZbO4tnX8p5g9SgsRwoSESkojy+ZR9d3b0sn6832cuFgkREKkpLIskVlwxn5ng1aCwXChIRqRg724+yNXWEZU1q0FhOFCQiUjFaEkkG1RtL1KCxrDSUuoBy9uBj23n5zUOR83L9NpTrd6So4bl+obIca8k9vh8b7WN8f9cfdQwsXI9ZsBd1ZtTVBfuUnl5nfDAve6zZR6dnjiU9L3tZIxwXjs9cD1BXZ2FdZ48l3EbmWD5SS47a6yyi7oix4brT68wcS0a96Rqy9zGo8aP15Vw26ximt/+Rv4sPjttHl/3o8bEP96WO6GXJWNZy/38YSF3dPTy6aS9fmnGJGjSWGQVJHxqHDWHSyLNvdvIc4z3XjIglco3Nve7oOf2tpb/rzyXX8F533MFxeh16ep0zPR5MB3o9WLg3PaY3qMnD5T4cF64na2zmNtJjM5ft9Q+/93OM9T72Q/rvrBCyzGnpsMoKsLOC7sNAy172TG9v2KBR946UGwVJH/7zDdNKXYIUQTqsMkMHIkKo98OA9MzQyh7rfCRM02Ph7O1kj80MUCe4gzsdpOmxQWBmB2PEsmcFaxDGOZelr1A+u77ejHr8g+P44TrTNff29vHLQXr/cy2b9Xew6MqxXKsGjWVHQSI1z8yoN6jPfZFQRPqgN9tFRKQgChIRESmIgkRERAqiIBERkYIoSEREpCAKEhERKYiCRERECqIgERGRglh/W2NUIjPrBN7+mIuPBg4OYDkDRXX1j+rqH9XVP9Va12Xufs5WAjURJIUws4S7N5W6jmyqq39UV/+orv6p9bp0aUtERAqiIBERkYIoSM7tp6UuIAfV1T+qq39UV//UdF16j0RERAqiMxIRESmIgiSLma0ws1fNbKuZrTKzETnG3Whmu8xst5ndX4S6lprZDjPrNbOcn8Iws7fMbJuZbTazRBnVVezjNdLMnjWz18M/L84xric8VpvNbHWM9fS5/2Y2xMyaw/kvmdnkuGrpZ13fMLPOjGP050Wq6xdm1mFm23PMNzP7u7DurWY2rwxq+oKZHck4Vg/GXVO43Ylmts7Mdob/F/8iYky8xyt4Apq+0l/Al4CG8PUPgR9GjKkH3gCmAoOBLcCMmOu6EpgOPA809THuLWB0EY/XOesq0fF6CLg/fH1/1N9jOO94EY7ROfcf+E/AP4avvwo0l0ld3wD+vlj/njK2ex0wD9ieY/7NwFMEj5S/GnipDGr6AvBECY7VOGBe+Ho48FrE32Osx0tnJFnc/Rl37w6/XQ9EPSB6AbDb3fe4+2ngd8CtMde10913xbmNjyPPuop+vML1/yp8/StgSczb60s++59Z70rgBjOL+5GNpfh7yYu7vwgc6mPIrcCvPbAeGGFm40pcU0m4e7u7bwxfHwN2AuOzhsV6vBQkffv3BCmebTyQzPg+xdl/caXiwDNmtsHM7i11MaFSHK+x7t4OwX80YEyOceeZWcLM1ptZXGGTz/5/MCb8ReYIMCqmevpTF8Ad4eWQlWY2Meaa8lWu/wc/Y2ZbzOwpM/tUsTceXhKdC7yUNSvW41WTz2w3sz8Al0TMesDdHwvHPAB0A7+JWkXEtII//pZPXXm4xt33mdkY4FkzezX8TaqUdRX9ePVjNZPC4zUVeM7Mtrn7G4XWliWf/Y/lGJ1DPtt8HPitu3eZ2bcIzpoWxlxXPkpxvM5lI0FLkeNmdjPwKDCtWBs3s2HAw8BfuvvR7NkRiwzY8arJIHH3RX3NN7O7gS8DN3h4gTFLCsj8zWwCsC/uuvJcx77wzw4zW0Vw+aKgIBmAuop+vMzsgJmNc/f28BS+I8c60sdrj5k9T/Db3EAHST77nx6TMrMG4CLiv4xyzrrc/d2Mb39G8L5hOYjl31QhMn94u/saM/sHMxvt7rH34DKzQQQh8ht3fyRiSKzHS5e2spjZjcB3gcXufjLHsDZgmplNMbPBBG+OxvaJn3yZ2VAzG55+TfDBgchPmBRZKY7XauDu8PXdwFlnTmZ2sZkNCV+PBq4BXomhlnz2P7PeO4HncvwSU9S6sq6jLya4/l4OVgP/Lvw00tXAkfSlzFIxs0vS72uZ2QKCn6/v9r3UgGzXgJ8DO939b3MMi/d4FfsTBuX+BewmuJa4OfxKf5LmUmBNxribCT4d8QbBJZ6467qN4LeKLuAA8HR2XQSfvtkSfu0ol7pKdLxGAWuB18M/R4bTm4B/Cl9/FtgWHq9twD0x1nPW/gM/IPiFBeA8oDX89/cyMDXuY5RnXf8j/Le0BVgHXFGkun4LtANnwn9f9wDfAr4Vzjfgx2Hd2+jjk4xFrOnbGcdqPfDZIh2rzxFcptqa8XPr5mIeL93ZLiIiBdGlLRERKYiCRERECqIgERGRgihIRESkIAoSEREpiIJEpA9mdrzA5VeGd833NeZ566Nzcr5jssY3mtnv8x0vUggFiUhMwl5L9e6+p9jbdvdOoN3Mrin2tqX2KEhE8hDeEbzCzLZb8LyX5eH0urAVxg4ze8LM1pjZneFiXyPjjnoz+z9hg8gdZvbfcmznuJn9LzPbaGZrzawxY/ZSM3vZzF4zs2vD8ZPN7I/h+I1m9tmM8Y+GNYjESkEikp/bgTnAbGARsCJsH3I7MBmYCfw58JmMZa4BNmR8/4C7NwGzgM+b2ayI7QwFNrr7POAF4PsZ8xrcfQHwlxnTO4AvhuOXA3+XMT4BXNv/XRXpn5ps2ijyMXyOoAtuD3DAzF4A5ofTW929F9hvZusylhkHdGZ8vyxs7d8QzptB0NYiUy/QHL7+FyCzAV/69QaC8AIYBPy9mc0BeoDLM8Z3ELSqEYmVgkQkP7keMtXXw6feJ+ihhZlNAb4DzHf3w2b2y/S8c8jsYdQV/tnDh/93/wtBj7PZBFcYTmWMPy+sQSRWurQlkp8XgeVmVh++b3EdQXPFfyV48FOdmY0leNxq2k7gk+HrC4ETwJFw3E05tlNH0P0X4N+E6+/LRUB7eEb0bwken5t2OeXR/VmqnM5IRPKziuD9jy0EZwl/7e77zexh4AaCH9ivETyZ7ki4zJMEwfIHd99iZpsIusPuAf6UYzsngE+Z2YZwPcvPUdc/AA+b2VKC7rwnMuZdH9YgEit1/xUpkJkN8+CpeKMIzlKuCUPmfIIf7teE763ks67j7j5sgOp6EbjV3Q8PxPpEctEZiUjhnjCzEcBg4L+7+34Ad3/fzL5P8Gzsd4pZUHj57W8VIlIMOiMREZGC6M12EREpiIJEREQKoiAREZGCKEhERKQgChIRESmIgkRERAry/wHNH4ItxAZVrAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "alpha is: 10.0\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>coef_lr</th>\n",
       "      <th>coef_ridge</th>\n",
       "      <th>columns</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>[0.6444774183345154]</td>\n",
       "      <td>[0.6275083240477665]</td>\n",
       "      <td>atemp</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[0.2932870174143726]</td>\n",
       "      <td>[0.2769855493322594]</td>\n",
       "      <td>season</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[0.03036364260009024]</td>\n",
       "      <td>[0.028434783389145735]</td>\n",
       "      <td>weekday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>[0.013813330209271746]</td>\n",
       "      <td>[0.014400902854231179]</td>\n",
       "      <td>workingday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[0.009025595984837709]</td>\n",
       "      <td>[0.02629649912611577]</td>\n",
       "      <td>mnth</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[-0.05380930040104529]</td>\n",
       "      <td>[-0.053623816058115725]</td>\n",
       "      <td>holiday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>[-0.09249631254573863]</td>\n",
       "      <td>[-0.08847349501436463]</td>\n",
       "      <td>hum</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>[-0.09730252159431736]</td>\n",
       "      <td>[-0.09601170470478232]</td>\n",
       "      <td>windspeed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>[-0.19883735613055464]</td>\n",
       "      <td>[-0.1971687824316214]</td>\n",
       "      <td>weathersit</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  coef_lr               coef_ridge     columns\n",
       "6    [0.6444774183345154]     [0.6275083240477665]       atemp\n",
       "0    [0.2932870174143726]     [0.2769855493322594]      season\n",
       "3   [0.03036364260009024]   [0.028434783389145735]     weekday\n",
       "4  [0.013813330209271746]   [0.014400902854231179]  workingday\n",
       "1  [0.009025595984837709]    [0.02629649912611577]        mnth\n",
       "2  [-0.05380930040104529]  [-0.053623816058115725]     holiday\n",
       "7  [-0.09249631254573863]   [-0.08847349501436463]         hum\n",
       "8  [-0.09730252159431736]   [-0.09601170470478232]   windspeed\n",
       "5  [-0.19883735613055464]    [-0.1971687824316214]  weathersit"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mse_mean = np.mean(ridge.cv_values_, axis = 0)\n",
    "plt.plot(np.log10(alphas), mse_mean.reshape(len(alphas),1)) \n",
    "\n",
    "plt.xlabel('log(alpha)')\n",
    "plt.ylabel('mse')\n",
    "plt.show()\n",
    "\n",
    "print ('alpha is:', ridge.alpha_)\n",
    "\n",
    "# 看看各特征的权重系数，系数的绝对值大小可视为该特征的重要性\n",
    "fs = pd.DataFrame({\"columns\":list(columns), \"coef_lr\":list((lr.coef_.T)), \"coef_ridge\":list((ridge.coef_.T))})\n",
    "fs.sort_values(by=['coef_lr'],ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面选用不同的alpha作为超参数，使用交叉验证的方式分别算出每个alpha对应训练出来的模型各系数的均值，可以发现最佳的alpha是10，使用正则项对模型进行约束后，表中基本所有的系数相比于无正则项的系数都有所收缩，个别除外"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.3 正则化的线性回归（L1正则 --> Lasso）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The r2 score of LassoCV on test is -0.7120505126908245\n",
      "The r2 score of LassoCV on train is 0.7597376967780604\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\linear_model\\coordinate_descent.py:1094: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  y = column_or_1d(y, warn=True)\n"
     ]
    }
   ],
   "source": [
    "#### Lasso／L1正则\n",
    "# class sklearn.linear_model.LassoCV(eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, \n",
    "#                                    normalize=False, precompute=’auto’, max_iter=1000, \n",
    "#                                    tol=0.0001, copy_X=True, cv=None, verbose=False, n_jobs=1,\n",
    "#                                    positive=False, random_state=None, selection=’cyclic’)\n",
    "from sklearn.linear_model import LassoCV\n",
    "\n",
    "#设置超参数搜索范围\n",
    "#alphas = [ 0.01, 0.1, 1, 10,100]\n",
    "\n",
    "#生成一个LassoCV实例\n",
    "#lasso = LassoCV(alphas=alphas)  \n",
    "lasso = LassoCV()  \n",
    "\n",
    "#训练（内含CV）\n",
    "lasso.fit(X_train, y_train)  \n",
    "\n",
    "#测试\n",
    "y_test_pred_lasso = lasso.predict(X_test)\n",
    "y_train_pred_lasso = lasso.predict(X_train)\n",
    "\n",
    "\n",
    "# 评估，使用r2_score评价模型在测试集和训练集上的性能\n",
    "print ('The r2 score of LassoCV on test is', r2_score(y_test, y_test_pred_lasso))\n",
    "print ('The r2 score of LassoCV on train is', r2_score(y_train, y_train_pred_lasso))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "L1正则具有稀疏作用，当alpha值越大，正负alpha的区间越大，惩罚越狠，也会有更多的系数变为0，这里可以使用默认参数，即设置alpha的最大值与最小值的比值为0.001，在这个区间再分为100个均匀的alpha进行计算。迭代条件使用默认，可以发现lasso的r2socre评分比岭回归的稍微差一点，并不是说lasso模型不如岭回归，而是不用的数据集适用不同的模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl4VPd97/H3VzsIsQuziNVgY7DBjhXsxHZMYjuFuIUsjrektdPEtLdxkjZpb+0610ntprdJ2/S2DUkuyZNmdYjjxAmJiR3Hu9NAEDYYEAYEGCQESCySENo13/4xR/JY1jLYnDkzms/refRwlt+c8z0aNJ8553cWc3dEREQAcqIuQERE0odCQUREeikURESkl0JBRER6KRRERKSXQkFERHopFEREpJdCQUREeikURESkV17UBZypiRMn+qxZs6IuQ0Qko2zevPmYu5cO1S7jQmHWrFlUVFREXYaISEYxswPJtAvt8JGZfcvM6sxs+wDzzcz+w8yqzOwlM3tLWLWIiEhywuxT+DawbJD5y4F5wc8q4Gsh1iIiIkkILRTc/VngxCBNVgLf9bgNwFgzmxJWPSIiMrQozz6aBlQnjNcE00REJCJRhoL1M63fhzuY2SozqzCzivr6+pDLEhHJXlGGQg0wPWG8DKjtr6G7r3H3cncvLy0d8owqERF5g6IMhXXAnwRnIV0ONLr74QjrERHJeqFdp2BmPwSWAhPNrAb4HJAP4O5fB9YD7wGqgBbgI2HVIiKSybpjzj/9aid/tHgqi8rGhrqu0ELB3W8ZYr4DHw9r/SIiw0VVXTPfeG4/8yePDj0UdO8jEZE0t7W6AYDF08MNBFAoiIikvRerGygpymPOxOLQ16VQEBFJc1uqG7h4+lhycvo7k//sUiiIiKSxlo4udh89xeKQ+xJ6KBRERNLY9kNNdMeci1PQnwAKBRGRtLal+iSQmk5mUCiIiKS1rdWNTBs7gtKSwpSsT6EgIpLGtlQ3cPGM1OwlgEJBRCRt1Z1q41BDK5ek6NARKBRERNLW1upGIHX9CaBQEBFJW1uqT5KbY1w4dUzK1qlQEBFJU1uqG5g/uYQRBbkpW6dCQUQkDcVizkvVjSm7PqGHQkFEJA3tP36aU+1dKbuSuYdCQUQkDe05egqA+VNKUrpehYKISBqqqmsG4NzSUSldr0JBRCQN7alrZtrYERQXhvYstH4pFERE0lBVXTPnTkrtXgKEHApmtszMdplZlZnd1c/8mWb2hJm9ZGZPm1lZmPWIiGSCWMzZW9/M3BQfOoIQQ8HMcoHVwHJgAXCLmS3o0+xfgO+6+yLgPuD/hlWPiEimONTQSltnjHnnDKNQAJYAVe6+z907gLXAyj5tFgBPBMNP9TNfRCTr9HQyzx1mh4+mAdUJ4zXBtERbgQ8Ew+8DSsxsQog1iYikvd5QGE6Hj4D+Hibqfcb/GrjazF4ErgYOAV2vW5DZKjOrMLOK+vr6s1+piEgaqaprZuKoAsYVF6R83WGGQg0wPWG8DKhNbODute7+fne/BLgnmNbYd0Huvsbdy929vLS0NMSSRUSit6fuVMqvT+gRZihsAuaZ2WwzKwBuBtYlNjCziWbWU8PdwLdCrEdEJO25O1V1zZH0J0CIoeDuXcCdwGPATuBBd99hZveZ2Yqg2VJgl5ntBs4BvhBWPSIimaC+uZ2mti7mRRQKoV4q5+7rgfV9pt2bMPwQ8FCYNYiIZJJXzzxK7T2PeuiKZhGRNBLl6aigUBARSStVdc2UFOZxzujCSNavUBARSSM99zwy6++s/vApFERE0sieCM88AoWCiEjaaGztpP5Ue2RnHoFCQUQkbVTVxZ+2pj0FERFh5+GeR3COjqwGhYKISJqoPNzE6KI8po4piqwGhYKISJrYebiJBVNHR3bmESgURETSQnfMefnwKS6I8NARKBRERNLCgeOnae3sZoFCQUREKg83AWhPQURE4v0JeTkWyXOZEykURETSQGVtE3MnjaIwLzfSOhQKIiJpYOfhU5H3J4BCQUQkcidOd3CkqS3y/gRQKIiIRG5n0Mm8YKpCQUQk61XWpseZRxByKJjZMjPbZWZVZnZXP/NnmNlTZvaimb1kZu8Jsx4RkXS083ATk0cXMb64IOpSwgsFM8sFVgPLgQXALWa2oE+zzwIPuvslwM3AV8OqR0QkXVUebuKCKdE8k7mvMPcUlgBV7r7P3TuAtcDKPm0c6NlfGgPUhliPiEjaae/qpqquOS36EwDyQlz2NKA6YbwGuKxPm88DvzazTwDFwLUh1iMiknb2HG2mK+Zp0Z8A4e4p9HebP+8zfgvwbXcvA94DfM/MXleTma0yswozq6ivrw+hVBGRaOyobQRg4dQxEVcSF2Yo1ADTE8bLeP3hoY8CDwK4+++AImBi3wW5+xp3L3f38tLS0pDKFRFJvS3VDYwZkc+sCSOjLgUINxQ2AfPMbLaZFRDvSF7Xp81B4BoAM7uAeChoV0BEssaLBxtYPH1spM9QSBRaKLh7F3An8Biwk/hZRjvM7D4zWxE0+wxwh5ltBX4I3O7ufQ8xiYgMSy0dXew+eoqLp4+NupReYXY04+7rgfV9pt2bMFwJXBFmDSIi6WpbTSMxh4unp0d/AuiKZhGRyGypbgBgcVn67CkoFEREIrK1poEZ40cyYVRh1KX0UiiIiERkS9DJnE4UCiIiEahraqO2sS2tOplBoSAiEokXg/4EhYKIiLC1uoG8HGNhmtzzqIdCQUQkAluqG7hgymiK8qN9JnNfCgURkRTrjjkv1TSm3aEjUCiIiKTcvvpmmtu70u7MI1AoiIik3IsH07OTGRQKIiIpt3H/CcYXF3BuaXHUpbyOQkFEJMU27j/Oklnj0+bOqIkUCiIiKXSooZWak61cNmd81KX0S6EgIpJCG/cdB+Cy2RMirqR/CgURkRTauO8EY0bkM39ySdSl9EuhICKSQhv3H+ets8aTk5N+/QmgUBARSZmjTW28cryFy9O0PwEUCiIiKbMh6E9YMjtLQ8HMlpnZLjOrMrO7+pn/b2a2JfjZbWYNYdYjIhKljftPMKowjwVT0usmeIlCe0azmeUCq4HrgBpgk5mtC57LDIC7/1VC+08Al4RVj4hI1DbuO075rHHk5abvQZowK1sCVLn7PnfvANYCKwdpfwvwwxDrERGJTP2pdvbWn07bU1F7hBkK04DqhPGaYNrrmNlMYDbwZIj1iIhE5vf7TwCk7UVrPcIMhf7Ot/IB2t4MPOTu3f0uyGyVmVWYWUV9ff1ZK1BEJFWerzpGcUEuF00bE3UpgwozFGqA6QnjZUDtAG1vZpBDR+6+xt3L3b28tLT0LJYoIhI+d+fZ3fVcMXci+WncnwDhhsImYJ6ZzTazAuIf/Ov6NjKz84FxwO9CrEVEJDJVdc0camjl6vPT/0ttaKHg7l3AncBjwE7gQXffYWb3mdmKhKa3AGvdfaBDSyIiGe2Z3fHD3lefl/6hENopqQDuvh5Y32favX3GPx9mDSIiUXt6Vz1zJ42ibNzIqEsZUnof3BIRyXAtHV38fv8JlmbAXgIoFEREQvW7vcfp6I5lRH8CKBREREL1zO56RuTn8tZZ6X19Qg+FgohIiJ7ZXc/bzp1AUX5u1KUkRaEgIhKS/cdOc+B4C0sz5NARKBRERELz9K46IDNORe2hUBARCcnjlUeZU1rMzAnFUZeSNIWCiEgITpzuYOP+Eyy/cHLUpZwRhYKISAgerzxCd8xZfuGUqEs5I0mHgpldaWYfCYZLzWx2eGWJiGS2X20/Qtm4ESycmr5PWetPUqFgZp8D/ha4O5iUD3w/rKJERDJZY2snv606xvILJ2PW31ME0leyewrvA1YApwHcvRYoCasoEZFM9sTOo3R2O8sy7NARJB8KHcFdTB3AzDKnK11EJMV+tf0I54wu5JLpY6Mu5YwlGwoPmtn/B8aa2R3Ab4BvhFeWiEhmOt3exbO761m2cDI5OZl16AiSvHW2u/+LmV0HNAHnA/e6++OhViYikoGe2lVHe1eM5Rdl3qEjSDIUgsNFT7r748GT0s43s3x37wy3PBGRzLJ+22EmjirImBvg9ZXs4aNngUIzm0b80NFHgG+HVZSISCZqauvkNzvruP6iKeRm4KEjSD4UzN1bgPcD/+nu7wMWhFeWiEjmeXT7ETq6Yqy8ZFrUpbxhSYeCmb0N+BDwSDAt1Ed5iohkmp9vOcTMCSMz8qyjHsmGwqeAu4CfuvuO4GrmJ4d6kZktM7NdZlZlZncN0OZGM6s0sx1m9kDypYuIpI8jjW38997jrLx4WsZdsJYo2W/7LUAMuMXMPgwYwTULAzGzXGA1cB1QA2wys3XuXpnQZh7xq6SvcPeTZjbpDWyDiEjkfrG1Fnd478VToy7lTUk2FH4A/DWwnXg4JGMJUOXu+wDMbC2wEqhMaHMHsNrdTwK4e12SyxYRSSs/23KIRWVjmFM6KupS3pRkDx/Vu/sv3H2/ux/o+RniNdOA6oTxmmBaovOA88zst2a2wcyW9bcgM1tlZhVmVlFfX59kySIiqbHn6Cl21Dbx3oszt4O5R7J7Cp8zs28CTwDtPRPd/aeDvKa/g2p9DznlAfOApUAZ8JyZXejuDa95kfsaYA1AeXn5oIetRERS7WdbDpFj8IeLM/OCtUTJhsJHgPnE747ac/jIgcFCoQaYnjBeBtT202ZDcBHcfjPbRTwkNiVZl4hIpDq7Yzy0uYar5pUyqaQo6nLetGRDYbG7X3SGy94EzAvOVDoE3Azc2qfNz4BbgG+b2UTih5P2neF6REQi88TOoxxtaucf3jsz6lLOimT7FDaY2RldrObuXcCdwGPATuDB4HTW+8xsRdDsMeC4mVUCTwF/4+7Hz2Q9IiJR+v6Gg0wdU8S75g+PkyeT3VO4ErjNzPYT71MwwN190WAvcvf1wPo+0+5NGHbg08GPiEhG2X/sNM9XHeMz152Xsbe16CvZUOj3rCARkWz2gw0HyMsxbloyfejGGSLZW2cPdfqpiEhWaevs5seba/iDhZOHRQdzj2T7FEREJMEvttbS2NrJhy6fEXUpZ5VCQUTkDfj+hgOcW1rM2+ZMiLqUs0qhICJyhl48eJKtNY3c9vZZGX3zu/4oFEREztC3//sVSgrzeP9byqIu5axTKIiInIGjTW088tJhPlg+nVGFw++xMgoFEZEz8IONB+l250/eNjyuYO5LoSAikqT2rm4e2HiAd50/iVkTi6MuJxQKBRGRJK3fdphjzR3cfsWsqEsJjUJBRCQJ7s63nn+Fc0uLuXLuxKjLCY1CQUQkCc/uOca2Q4187Ko5w+401EQKBRGRIbg7//nEHqaMKeIDw/A01EQKBRGRIWzcf4KKAyf586vPpSBveH9sDu+tExE5C77yZBWlJYXc9NbhczfUgSgUREQG8cLBkzxfdYxVV82hKD836nJCp1AQERnEV56sYtzIfG69bHjdDXUgCgURkQFs2HecJ1+u42NXzaF4GN7Soj+hhoKZLTOzXWZWZWZ39TP/djOrN7Mtwc/HwqxHRCRZsZjzD49UMnVMER+9cnbU5aRMaNFnZrnAauA6oAbYZGbr3L2yT9MfufudYdUhIvJG/GzLIbYfauL/3XRxVvQl9AhzT2EJUOXu+9y9A1gLrAxxfSIiZ0VrRzdfenQXi8rGsGLx1KjLSakwQ2EaUJ0wXhNM6+sDZvaSmT1kZv2e72Vmq8yswswq6uvrw6hVRKTXN5/bx5GmNj57/QJycobv1cv9CTMU+vtNep/xXwCz3H0R8BvgO/0tyN3XuHu5u5eXlpae5TJFRF5Vc7KFrz2zl2ULJ7Nk9vioy0m5MEOhBkj85l8G1CY2cPfj7t4ejH4DuDTEekREBuXu/N3D2zHgs394QdTlRCLMUNgEzDOz2WZWANwMrEtsYGZTEkZXADtDrEdEZFA/eeEQz+6u52+Xz6ds3Mioy4lEaGcfuXuXmd0JPAbkAt9y9x1mdh9Q4e7rgE+a2QqgCzgB3B5WPSIig6k71cb9v6ykfOY4PnzZ8HyqWjJCvRrD3dcD6/tMuzdh+G7g7jBrEBFJxufX7aC1s5sv3rAo6zqXE+mKZhHJeuu21rJ+2xE+dc08zi0dFXU5kVIoiEhWO9LYxmcf3sYlM8byZ++YE3U5kVMoiEjWisWcv3loK53dzpdvvJi8XH0k6jcgIlnrexsO8NyeY9xz/QXMnlgcdTlpQaEgIlmp+kQL/7h+J0vPL+VDWXJb7GQoFEQkK/3b47sB+Mf3XYRZ9p5t1JdCQUSyzs7DTTy85RC3XzGLqWNHRF1OWlEoiEjW+dKjL1NSmMdfXD036lLSjkJBRLLKxn3HeWpXPf9r6VzGjMyPupy0o1AQkazh7vzToy9zzuhCbn/7rKjLSUsKBRHJGuu3HeHFgw385bXnMaIge56mdiYUCiKSFVo7uvnCI5VcMGU0N5b3+zwvQaEgIlnia8/spbaxjb9fsZDcLL7h3VAUCiIy7FWfaOHrz+xlxeKpWfk0tTOhUBCRYe8Lj+wk14y73zM/6lLSnkJBRIa1p3fV8eiOI9z5rrlMGaML1YaiUBCRYauxtZO7frKNeZNG8dErZ0ddTkYI9clrIiJRuv+XldQ3t7PmTy6lKF+noCYj1D0FM1tmZrvMrMrM7hqk3Q1m5mZWHmY9IpI9nth5lIc21/AXS89lUdnYqMvJGKGFgpnlAquB5cAC4BYzW9BPuxLgk8DGsGoRkezS0NLBXT/dxvzJJXziXfOiLiejhLmnsASocvd97t4BrAVW9tPufuBLQFuItYhIFvnCIzs5ebqDf71xMQV56jo9E2H+tqYB1QnjNcG0XmZ2CTDd3X8ZYh0ikkX+e+8xfry5hjveMYeFU8dEXU7GCTMU+rtk0HtnmuUA/wZ8ZsgFma0yswozq6ivrz+LJYrIcNLW2c09D29n5oSRfOoaHTZ6I8IMhRog8QYjZUBtwngJcCHwtJm9AlwOrOuvs9nd17h7ubuXl5aWhliyiGSy1U9Vsf/Yab7w3ot0ttEbFGYobALmmdlsMysAbgbW9cx090Z3n+jus9x9FrABWOHuFSHWJCLD1O6jp/j6M3t5/yXTuHLexKjLyVihhYK7dwF3Ao8BO4EH3X2Hmd1nZivCWq+IZJ+OrhiffnALJUX53HP9BVGXk9FCvXjN3dcD6/tMu3eAtkvDrEVEhq//fHIP2w818fUPX8qEUYVRl5PRdK6WiGS0zQdOsvqpKm64tIxlF06OupyMp1AQkYx1ur2Lzzy4hSljRvC5P3rdtbHyBujeRyKSkdyd//Pz7Rw40cLaOy6npCg/6pKGBe0piEhG+u7vDvDTFw7xqWvmcdmcCVGXM2woFEQk42zcd5z7f1nJtRecwyd1b6OzSqEgIhnlcGMrH3/gBWZMGMmXb1pMjp63fFYpFEQkYzS2dvKR/9pEa0c3a/74UkarH+GsU0eziGSE1o5uPvadTeytb+a/bl/C3EklUZc0LCkURCTtdXbHuPOBF6g4cJKv3PIW3cYiRDp8JCJpras7xmce3MoTL9dx/8oLuX7RlKhLGta0pyAiaau9q5tPPPAiv648yl3L5/Phy2dGXdKwp1AQkbTU2tHNqu9V8NyeY/z9ioXc9vZZUZeUFRQKIpJ26k+182ffq2BLdQNfumERN5ZPH/pFclYoFEQkrVTWNnHHdys4frqd1be+heUXqQ8hlRQKIpI2Ht1+mE8/uJXRRfk89Odv58JpesZyqikURCRyrR3d/MMjlfxg40EWTx/LN/74UiaNLoq6rKykUBCRSFXWNvHJtS9SVdfMqnfM4TPvPo/CPD1fOSoKBRGJRGtHN//+xB6++dw+xhUX8L2PLuGqeaVRl5X1Qg0FM1sG/DuQC3zT3f+pz/w/Bz4OdAPNwCp3rwyzJhGJlrvz1K467v35DmpOtnLDpWX83XsuYHxxQdSlCSGGgpnlAquB64AaYJOZrevzof+Au389aL8C+DKwLKyaRCRaW6ob+OKvXuZ3+45zbmkxa1ddzuV6FkJaCXNPYQlQ5e77AMxsLbAS6A0Fd29KaF8MeIj1iEhEttU0svqpKh7dcYQJxQV8/o8WcOtlMynI05120k2YoTANqE4YrwEu69vIzD4OfBooAN4VYj0ikkKxmPNc1THWPLuX31Ydp6Qwj09dM4873jGHUYXqzkxXYb4z/T354nV7Au6+GlhtZrcCnwVue92CzFYBqwBmzJhxlssUkbOpsaWTh16o4QcbDrDv2GkmlRRy9/L53HLZDD3/IAOEGQo1QOK16WVA7SDt1wJf62+Gu68B1gCUl5frEJNImunsjvHs7noefvEQj1cepb0rxltmjOXLNy7m+kVTdIppBgkzFDYB88xsNnAIuBm4NbGBmc1z9z3B6PXAHkQkI7R1dvP8nmP8uvIIv9lZx4nTHYwbmc+N5dO5ecl0Fk7V1ciZKLRQcPcuM7sTeIz4KanfcvcdZnYfUOHu64A7zexaoBM4ST+HjkQkPcRizp66Zp7bU8/zVcfYuO8ErZ3dlBTmsXT+JFYunso7zitV53GGM/fMOhpTXl7uFRUVUZchMuw1tnSyo7aRLTUNbH7lJJsPnqShpROAOaXFXDl3ItdecA6Xz5mgIMgAZrbZ3cuHaqdTAESyXGNrJ9UnWthb38yeo83sPnqKl4+c4uCJlt4255YW8wcLJnPprHFcMXci08aOiLBiCZNCQWQYi8WcI01t1Jxs5XBjK7UNbdQ2vDp8qKGVxtbO3va5OcbMCSNZOHU0N711OhdNG8NF08YwTlcbZw2FgkiGa+vspuZkCwdPtHDweAsHT7Ry8EQLB46f5sCJFjq6Yq9pP7ooj6ljRzB5TBGXzBjLzAkjmTF+JDMnFDOntFhnCmU5hYJImnJ3mtu7ONbcwbHmdo42tXG0Kf7voYZWahtaOXSylbpT7a953ciCXKaPG8nsicW8c/4kZk4YyfRxI5k6togpY0ZQrAvHZBD63yESoo6uGKfaOmlu7+JUWxdNrZ00tXXS1NZFc1tXML2TptYumto6aWzt5GRLJydPd3CypYP2Pt/yAQrycpg6pohp40aw9PxSpo8byYwJIykbF//GP3FUAWb9XTsqMjSFgmQcdyfmEHOnO+Z4z7A7HoPuYHrP/MThmDtdMaerO/5vdyxGZ3d8vDMWi0/vjtHRHZ/e0RWjo6ubju4Y7Z0x2rtitHd109YZo62zm9bObto6u2npiP+0dnRzuqOL0+1dnG6Pv24oRfk5jBmRz+iifEaPyGfa2BFcOHU044sLmDCqgImjCpkwqpBzRhdyTkkRY0fm60NfQpM1ofDAxoN89emqAecn8zdm/d65I/nlJPNnPNAf+4CvHWDGQO3P9MPE3V+9N4nH71PSM63nbGYn/sHcOx58aPdO73lNMBzrGfY+84NlJi6vZzjWZ51RyTEoys+lIC+HEfm5jMjPpTA/l+KCXEqK8pg8uoiRhbmMKsxjZEEeowpzKSnKZ1RhHqOK8no//EuK8hhdlE9xYS55uTqdU9JH1oTClDFFLJk9vv+ZSXzQJPNZNNg1H8m9/sxeO9D6BlxXkh+ojr82AO3VkDGLz7FgWk/IWG87I8d65lv8X7Pe9jkJw69OT5ifY69ZVuLrcoIVG/GzZHKCZeeYkZvT0yY+PTfHyMkxcnvnW++0/GA4L9fIy8khL8fIy80hL9fIz4n/W5CXQ0FuDvm5ORTm5cTH8+Jt9S1dhrOsCYV3zp/EO+dPiroMEZG0pv1WERHppVAQEZFeCgUREemlUBARkV4KBRER6aVQEBGRXgoFERHppVAQEZFeGffkNTOrBw6kYFUTgWMpWE8qaFvS03DZluGyHTC8t2Wmu5cO9aKMC4VUMbOKZB5dlwm0LelpuGzLcNkO0LaADh+JiEgChYKIiPRSKAxsTdQFnEXalvQ0XLZluGwHaFvUpyAiIq/SnoKIiPRSKATM7H4ze8nMtpjZr81s6gDtbjOzPcHPbamuMxlm9s9m9nKwPQ+b2dgB2r1iZtuCba5IdZ3JOINtWWZmu8ysyszuSnWdQzGzD5rZDjOLmdmAZ4RkyHuS7Lak9XsCYGbjzezx4O/5cTMbN0C77uA92WJm61Jd50CG+h2bWaGZ/SiYv9HMZg250PgjEfUDjE4Y/iTw9X7ajAf2Bf+OC4bHRV17P3W+G8gLhr8IfHGAdq8AE6Ou981uC5AL7AXmAAXAVmBB1LX3qfEC4HzgaaB8kHaZ8J4MuS2Z8J4EdX4JuCsYvmuQv5XmqGt9I79j4C96PsuAm4EfDbVc7SkE3L0pYbSY/h9e+QfA4+5+wt1PAo8Dy1JR35lw91+7e1cwugEoi7KeNyPJbVkCVLn7PnfvANYCK1NVYzLcfae774q6jrMhyW1J+/cksBL4TjD8HeC9EdZyppL5HSdu30PANTbE82QVCgnM7AtmVg18CLi3nybTgOqE8ZpgWjr7U+BXA8xz4NdmttnMVqWwpjdqoG3JxPdlIJn2ngwkU96Tc9z9MEDw70DP7C0yswoz22Bm6RIcyfyOe9sEX64agQmDLTRrntEMYGa/ASb3M+sed/+5u98D3GNmdwN3Ap/ru4h+XhvJ6VtDbUvQ5h6gC/jBAIu5wt1rzWwS8LiZvezuz4ZT8cDOwrakxfuSzHYkIWPek6EW0c+0tPtbOYPFzAjelznAk2a2zd33np0K37Bkfsdn/D5kVSi4+7VJNn0AeITXh0INsDRhvIz4cdWUG2pbgk7wPwSu8eCAYj/LqA3+rTOzh4nvjqb8A+gsbEsNMD1hvAyoPXsVJucM/n8NtoyMeE+SkBbvCQy+LWZ21MymuPthM5sC1A2wjJ73ZZ+ZPQ1cQvx4fpSS+R33tKkxszxgDHBisIXq8FHAzOYljK4AXu6n2WPAu81sXHCWwruDaWnFzJYBfwuscPeWAdoUm1lJzzDxbdmeuiqTk8y2AJuAeWY228wKiHeopc0ZIsnKlPckSZnynqwDes4ivA143V5Q8PdeGAxPBK4AKlNW4cCS+R0nbt8NwJMDfUnsFXUPerr8AD8h/gf4EvALYFowvRz4ZkK7PwWqgp+PRF33ANtSRfw44pbgp+fsg6nA+mB4DvGzFbYCO4gfFojuYUGDAAADFElEQVS89jeyLcH4e4DdxL+9pd22AO8j/q2tHTgKPJbB78mQ25IJ70lQ4wTgCWBP8O/4YHrv3z3wdmBb8L5sAz4add2D/Y6B+4h/iQIoAn4c/B39Hpgz1DJ1RbOIiPTS4SMREemlUBARkV4KBRER6aVQEBGRXgoFERHppVCQrGFmzW/y9Q8FV7QO1ubpwe4cmmybPu1LzezRZNuLvBkKBZEkmNlCINfd96V63e5eDxw2sytSvW7JPgoFyToW989mtj14dsFNwfQcM/tq8KyAX5rZejO7IXjZh0i42tXMvhbcIG2Hmf39AOtpNrN/NbMXzOwJMytNmP1BM/u9me02s6uC9rPM7Lmg/Qtm9vaE9j8LahAJlUJBstH7gYuBxcC1wD8H9715PzALuAj4GPC2hNdcAWxOGL/H3cuBRcDVZraon/UUAy+4+1uAZ3jtvbTy3H0J8JcJ0+uA64L2NwH/kdC+ArjqzDdV5Mxk1Q3xRAJXAj90927gqJk9A7w1mP5jd48BR8zsqYTXTAHqE8ZvDG5tnRfMW0D8FimJYsCPguHvAz9NmNczvJl4EAHkA18xs4uBbuC8hPZ1xG8jIRIqhYJko4EeMjLYw0daid9HBjObDfw18FZ3P2lm3+6ZN4TEe8q0B/928+rf4V8Rv5fQYuJ78W0J7YuCGkRCpcNHko2eBW4ys9zgOP87iN8s7HngA0Hfwjm89jbpO4G5wfBo4DTQGLRbPsB6cojfmRLg1mD5gxkDHA72VP6Y+OMWe5xH5t4xVTKI9hQkGz1MvL9gK/Fv7//b3Y+Y2U+Aa4h/+O4GNhJ/UhXEn6+xFPiNu281sxeJ38l0H/DbAdZzGlhoZpuD5dw0RF1fBX5iZh8Engpe3+OdQQ0iodJdUkUSmNkod282swnE9x6uCAJjBPEP6iuCvohkltXs7qPOUl3PAis9/mxwkdBoT0HktX5pZmOBAuB+dz8C4O6tZvY54s+8PZjKgoJDXF9WIEgqaE9BRER6qaNZRER6KRRERKSXQkFERHopFEREpJdCQUREeikURESk1/8ARVfINZxCXYsAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "alpha is: 0.004151794259440714\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>coef_lasso</th>\n",
       "      <th>coef_lr</th>\n",
       "      <th>coef_ridge</th>\n",
       "      <th>columns</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.641265</td>\n",
       "      <td>[0.6444774183345154]</td>\n",
       "      <td>[0.6275083240477665]</td>\n",
       "      <td>atemp</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.292394</td>\n",
       "      <td>[0.2932870174143726]</td>\n",
       "      <td>[0.2769855493322594]</td>\n",
       "      <td>season</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.026580</td>\n",
       "      <td>[0.03036364260009024]</td>\n",
       "      <td>[0.028434783389145735]</td>\n",
       "      <td>weekday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.010461</td>\n",
       "      <td>[0.013813330209271746]</td>\n",
       "      <td>[0.014400902854231179]</td>\n",
       "      <td>workingday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.006054</td>\n",
       "      <td>[0.009025595984837709]</td>\n",
       "      <td>[0.02629649912611577]</td>\n",
       "      <td>mnth</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.050648</td>\n",
       "      <td>[-0.05380930040104529]</td>\n",
       "      <td>[-0.053623816058115725]</td>\n",
       "      <td>holiday</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>-0.086233</td>\n",
       "      <td>[-0.09249631254573863]</td>\n",
       "      <td>[-0.08847349501436463]</td>\n",
       "      <td>hum</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>-0.093276</td>\n",
       "      <td>[-0.09730252159431736]</td>\n",
       "      <td>[-0.09601170470478232]</td>\n",
       "      <td>windspeed</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>-0.198828</td>\n",
       "      <td>[-0.19883735613055464]</td>\n",
       "      <td>[-0.1971687824316214]</td>\n",
       "      <td>weathersit</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   coef_lasso                 coef_lr               coef_ridge     columns\n",
       "6    0.641265    [0.6444774183345154]     [0.6275083240477665]       atemp\n",
       "0    0.292394    [0.2932870174143726]     [0.2769855493322594]      season\n",
       "3    0.026580   [0.03036364260009024]   [0.028434783389145735]     weekday\n",
       "4    0.010461  [0.013813330209271746]   [0.014400902854231179]  workingday\n",
       "1    0.006054  [0.009025595984837709]    [0.02629649912611577]        mnth\n",
       "2   -0.050648  [-0.05380930040104529]  [-0.053623816058115725]     holiday\n",
       "7   -0.086233  [-0.09249631254573863]   [-0.08847349501436463]         hum\n",
       "8   -0.093276  [-0.09730252159431736]   [-0.09601170470478232]   windspeed\n",
       "5   -0.198828  [-0.19883735613055464]    [-0.1971687824316214]  weathersit"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mses = np.mean(lasso.mse_path_, axis = 1)\n",
    "plt.plot(np.log10(lasso.alphas_), mses) \n",
    "#plt.plot(np.log10(lasso.alphas_)*np.ones(3), [0.3, 0.4, 1.0])\n",
    "plt.xlabel('log(alpha)')\n",
    "plt.ylabel('mse')\n",
    "plt.show()    \n",
    "            \n",
    "print ('alpha is:', lasso.alpha_)\n",
    "\n",
    "# 看看各特征的权重系数，系数的绝对值大小可视为该特征的重要性\n",
    "fs = pd.DataFrame({\"columns\":list(columns), \"coef_lr\":list((lr.coef_.T)), \"coef_ridge\":list((ridge.coef_.T)), \"coef_lasso\":list((lasso.coef_.T))})\n",
    "fs.sort_values(by=['coef_lr'],ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看出，alpha的最佳值是 0.01927092187407769，相对较大，所以对模型有一定惩罚作用，如工作日和月份两个特征的的系数就变成0了，所以lasso的训练结果认为这两个特征对结果的预测贡献度不大，验证了上方最小二乘线性回归的猜测，后续可以考虑将这两个特征移除，再训练一次岭回归模型，看看预测效果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
